Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.09246

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 6
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Paper • 2303.04137 • Published Mar 7, 2023 • 5
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
Temporal Difference Learning for Model Predictive Control

Paper • 2203.04955 • Published Mar 9, 2022 • 3

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

Embodiment enables interaction of model with environment. Key is to anticipate what change could've come with its current action.

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Paper • 2408.11812 • Published Aug 21, 2024 • 6
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 25
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 48

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 20
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 68
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 48
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 44

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 75
FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

Cognitively Inspired Energy-Based World Models

Paper • 2406.08862 • Published Jun 13, 2024 • 10
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 6
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Paper • 2303.04137 • Published Mar 7, 2023 • 5
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
Temporal Difference Learning for Model Predictive Control

Paper • 2203.04955 • Published Mar 9, 2022 • 3

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 75
FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27

Embodiment enables interaction of model with environment. Key is to anticipate what change could've come with its current action.

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Paper • 2408.11812 • Published Aug 21, 2024 • 6
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 25
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 48

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 20
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 68
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 48
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 44

Cognitively Inspired Energy-Based World Models

Paper • 2406.08862 • Published Jun 13, 2024 • 10
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs