Robotics
LeRobot
Safetensors
imitation-learning
aloha
act

ACT Model for ALOHA Insertion Task

A lightweight Action Chunking with Transformers (ACT) model trained on the ALOHA simulation Insertion task. This is a difficult bimanual coordination task with lower success rate compared to TransferCube.

Model Description

Property Value
Architecture ACT (Action Chunking with Transformers)
Parameters 52M
Task ALOHA Insertion-v0
Training Steps 200,000
Batch Size 32
Success Rate ~15%

Training Data

Task Description

The Insertion task requires a bimanual robot to:

  1. Pick up a socket with the left arm
  2. Pick up a peg with the right arm
  3. Insert the peg into the socket in mid-air

⚠️ This is a difficult task requiring precise bimanual coordination. Success rate is significantly lower than TransferCube.

Demo Video

Training Environment

  • GPU: RTX A6000
  • Framework: LeRobot 0.4.3
  • Training Time: Around 13 hours

Usage

Installation

pip install lerobot gym-aloha

Training

lerobot-train \
    --policy.type=act \
    --dataset.repo_id=lerobot/aloha_sim_insertion_human_image \
    --env.type=aloha \
    --env.task=AlohaInsertion-v0 \
    --batch_size=32 \
    --steps=200000 \
    --eval.n_episodes=10 \
    --eval_freq=20000 \
    --save_freq=20000 \
    --output_dir=./outputs/act_aloha_insertion \
    --wandb.enable=false \
    --policy.push_to_hub=false

Evaluation

lerobot-eval \
    --policy.path=LeTau/act_aloha_insertion \
    --env.type=aloha \
    --env.task=AlohaInsertion-v0 \
    --eval.batch_size=1 \
    --eval.n_episodes=20

Fine-tuning

lerobot-train \
    --resume=true \
    --config_path=LeTau/act_aloha_insertion/train_config.json \
    --steps=300000

Results

Evaluation Episodes Success Rate Avg Sum Reward
Training (120K) 10 10% 40.3
Training (200K) 10 20% 40.4
Independent 20 15% 51.2

Expected success rate: 15-20%

Task Difficulty Comparison

Task Difficulty Success Rate
TransferCube Easy 35-42%
Insertion Hard 15-20%

Detailed Evaluation Results (Independent)

Sum Rewards: [0.0, 0.0, 0.0, 240.0, 121.0, 0.0, 0.0, 0.0, 43.0, 0.0,
              256.0, 0.0, 0.0, 321.0, 0.0, 0.0, 0.0, 0.0, 43.0, 0.0]

Successes: 3/20 episodes

Limitations

  • Difficult task: Insertion requires precise bimanual coordination
  • Limited training data: Only 50 demonstration episodes available
  • Low success rate: This is a baseline model for a challenging task
  • Single task: Only trained on Insertion, no multi-task capability

Citation

@article{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  journal={arXiv preprint arXiv:2304.13705},
  year={2023}
}

Acknowledgments

  • LeRobot framework by HuggingFace
  • ALOHA project by Stanford
Downloads last month
11
Video Preview
loading

Dataset used to train LeTau/act_aloha_insertion