ACT Model for ALOHA Insertion Task
A lightweight Action Chunking with Transformers (ACT) model trained on the ALOHA simulation Insertion task. This is a difficult bimanual coordination task with lower success rate compared to TransferCube.
Model Description
| Property | Value |
|---|---|
| Architecture | ACT (Action Chunking with Transformers) |
| Parameters | 52M |
| Task | ALOHA Insertion-v0 |
| Training Steps | 200,000 |
| Batch Size | 32 |
| Success Rate | ~15% |
Training Data
- Dataset: lerobot/aloha_sim_insertion_human_image
- Episodes: 50 human demonstrations
- Frames: 20,000
Task Description
The Insertion task requires a bimanual robot to:
- Pick up a socket with the left arm
- Pick up a peg with the right arm
- Insert the peg into the socket in mid-air
⚠️ This is a difficult task requiring precise bimanual coordination. Success rate is significantly lower than TransferCube.
Demo Video
Training Environment
- GPU: RTX A6000
- Framework: LeRobot 0.4.3
- Training Time: Around 13 hours
Usage
Installation
pip install lerobot gym-aloha
Training
lerobot-train \
--policy.type=act \
--dataset.repo_id=lerobot/aloha_sim_insertion_human_image \
--env.type=aloha \
--env.task=AlohaInsertion-v0 \
--batch_size=32 \
--steps=200000 \
--eval.n_episodes=10 \
--eval_freq=20000 \
--save_freq=20000 \
--output_dir=./outputs/act_aloha_insertion \
--wandb.enable=false \
--policy.push_to_hub=false
Evaluation
lerobot-eval \
--policy.path=LeTau/act_aloha_insertion \
--env.type=aloha \
--env.task=AlohaInsertion-v0 \
--eval.batch_size=1 \
--eval.n_episodes=20
Fine-tuning
lerobot-train \
--resume=true \
--config_path=LeTau/act_aloha_insertion/train_config.json \
--steps=300000
Results
| Evaluation | Episodes | Success Rate | Avg Sum Reward |
|---|---|---|---|
| Training (120K) | 10 | 10% | 40.3 |
| Training (200K) | 10 | 20% | 40.4 |
| Independent | 20 | 15% | 51.2 |
Expected success rate: 15-20%
Task Difficulty Comparison
| Task | Difficulty | Success Rate |
|---|---|---|
| TransferCube | Easy | 35-42% |
| Insertion | Hard | 15-20% |
Detailed Evaluation Results (Independent)
Sum Rewards: [0.0, 0.0, 0.0, 240.0, 121.0, 0.0, 0.0, 0.0, 43.0, 0.0,
256.0, 0.0, 0.0, 321.0, 0.0, 0.0, 0.0, 0.0, 43.0, 0.0]
Successes: 3/20 episodes
Limitations
- Difficult task: Insertion requires precise bimanual coordination
- Limited training data: Only 50 demonstration episodes available
- Low success rate: This is a baseline model for a challenging task
- Single task: Only trained on Insertion, no multi-task capability
Citation
@article{zhao2023learning,
title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
journal={arXiv preprint arXiv:2304.13705},
year={2023}
}
Acknowledgments
- Downloads last month
- 11