EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization

skill-forge-peginsert-v0.1.1 is the domain-randomized reference model release in the EXOKERN catalog. It is trained on EXOKERN ContactBench v0.1.1 and ships the same paired comparison structure as v0:

  • full_ft_best_model.pt: primary checkpoint with 22D observations, including force/torque input
  • no_ft_best_model.pt: ablation checkpoint with the same architecture and 16D state-only observations

This release should be read as a robustness benchmark first. Both policies remain successful under severe domain randomization, and the repo is valuable precisely because it makes the mixed result on force reduction explicit.

Quick Facts

Item Value
Task Peg insertion in simulation under domain randomization
Dataset EXOKERN/contactbench-forge-peginsert-v0.1.1
Simulator NVIDIA Isaac Lab (Isaac Sim 4.5)
Robot Franka FR3
Architecture TemporalUNet1D diffusion policy
Parameters 71.3M
Observation horizon 10 frames
Prediction / execution horizon 16 / 8 actions
Seeds evaluated 42, 123, 7
Total rollouts reported 600

Benchmark Summary

The Hub metadata for this repo tracks the primary full_ft checkpoint. The full repo includes the paired no_ft ablation for comparison.

Checkpoint Success Rate Avg Contact Force (N) Peak Contact Force (N) Avg Episode Time (s)
full_ft 100.0 3.67 +/- 0.45 10.63 25.63
no_ft 100.0 3.37 +/- 0.06 10.33 25.73

EXOKERN skill v0.1.1 benchmark summary

Figure: multi-seed benchmark summary built from the published eval_seed42/123/7.json artifacts.

Per-seed results:

Seed Condition Success Rate Avg Force (N) Peak Force (N) Avg Time (s)
42 full_ft 100.0 3.24 10.44 25.61
42 no_ft 100.0 3.38 10.38 25.73
123 full_ft 100.0 4.12 10.57 25.74
123 no_ft 100.0 3.34 10.32 25.79
7 full_ft 100.0 3.69 10.93 25.54
7 no_ft 100.0 3.37 10.31 25.68

Interpretation:

  • This release demonstrates robust task completion under a much harder collection regime than v0.
  • On this particular peg-in-hole setup, domain randomization largely closed the force gap between full_ft and no_ft.
  • That does not prove force/torque is unnecessary in general. It shows that this release is best used as a robust benchmark and an honest reference point for harder future tasks.

What Changed Compared To v0

Topic v0 v0.1.1
Dataset regime Mostly fixed conditions Multi-layer domain randomization
Dataset size 2,221 episodes / 330,929 frames 5,000 episodes / 745,000 frames
Robot Franka Emika Panda Franka FR3
Force reduction takeaway Clear F/T advantage Inconclusive on this task
Best use Clean baseline Robustness benchmark

Architecture

This release uses the same 1D Temporal U-Net diffusion policy family as v0.

Architecture

Component Value
Action dimension 7
Observation dimensions 22 (full_ft) / 16 (no_ft)
Diffusion training steps 100
DDIM inference steps 16
Base channels 256
Channel multipliers (1, 2, 4)
Normalization Min-max to [-1, 1]

Repository Contents

File Description
full_ft_best_model.pt Best checkpoint with force/torque input
no_ft_best_model.pt Ablation checkpoint without force/torque input
inference.py Self-contained inference helper and model definition
config.yaml Training, dataset, and environment configuration
eval_seed42.json Seed 42 evaluation artifact
eval_seed123.json Seed 123 evaluation artifact
eval_seed7.json Seed 7 evaluation artifact
training_curve_full_ft_seed42.png Training curve for full_ft, seed 42
training_curve_full_ft_seed123.png Training curve for full_ft, seed 123
training_curve_full_ft_seed7.png Training curve for full_ft, seed 7
training_curve_no_ft_seed42.png Training curve for no_ft, seed 42
training_curve_no_ft_seed123.png Training curve for no_ft, seed 123
training_curve_no_ft_seed7.png Training curve for no_ft, seed 7

Usage

Reproduce evaluation with exokern-eval

pip install exokern-eval

wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt

exokern-eval \
  --policy full_ft_best_model.pt \
  --env Isaac-Forge-PegInsert-Direct-v0 \
  --episodes 100

Load the repo helper locally

import os
import sys

from huggingface_hub import snapshot_download

repo_dir = snapshot_download(
    repo_id="EXOKERN/skill-forge-peginsert-v0.1.1",
    allow_patterns=["*.pt", "inference.py"],
)
sys.path.insert(0, repo_dir)

from inference import DiffusionPolicyInference

policy = DiffusionPolicyInference(
    os.path.join(repo_dir, "full_ft_best_model.pt"),
    device="cpu",
)

policy.add_observation([0.0] * 22)
actions = policy.get_actions()
print(len(actions))

Training And Evaluation Setup

Item Value
Train / val split 85% / 15% by episode
Epochs 300
Batch size 256
Optimizer AdamW, lr=1e-4, weight_decay=1e-4
LR schedule Cosine annealing to 1e-6
EMA decay 0.995
Physics rate 120 Hz
Control rate 15 Hz
Domain randomization Enabled in the training dataset

Related Work

Citation

@misc{exokern_skill_peginsert_v011_2026,
  title        = {EXOKERN Skill v0.1.1: Robust Peg Insertion Under Domain Randomization},
  author       = {{EXOKERN}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1}},
  note         = {Paired full_ft and no_ft diffusion-policy checkpoints}
}

Security Note

The checkpoints in this repo are PyTorch pickles. Load them only in a trusted or isolated environment after reviewing the repository contents.

Limitations

  • Simulation only. This release does not claim real-robot readiness.
  • Reported robustness is specific to the peg-in-hole task and the randomization ranges documented in the paired dataset card.
  • The ablation result is mixed: use this repo to study robustness, not to overclaim a universal force/torque effect.
  • The repo exposes paired checkpoints for research comparison; the intended production-style reference in this repo is full_ft_best_model.pt.

Related Resources

Downloads last month
3
Video Preview
loading

Dataset used to train EXOKERN/skill-forge-peginsert-v0.1.1

Papers for EXOKERN/skill-forge-peginsert-v0.1.1

Evaluation results

  • Success Rate (%) on EXOKERN ContactBench v0.1.1
    self-reported
    100.000
  • Average Contact Force (N) on EXOKERN ContactBench v0.1.1
    self-reported
    3.670
  • Peak Contact Force (N) on EXOKERN ContactBench v0.1.1
    self-reported
    10.640