EXOKERN Skill v0.1.1 - Robust Peg Insertion Under Domain Randomization
skill-forge-peginsert-v0.1.1 is the domain-randomized reference model release in the EXOKERN catalog. It is trained on EXOKERN ContactBench v0.1.1 and ships the same paired comparison structure as v0:
full_ft_best_model.pt: primary checkpoint with 22D observations, including force/torque inputno_ft_best_model.pt: ablation checkpoint with the same architecture and 16D state-only observations
This release should be read as a robustness benchmark first. Both policies remain successful under severe domain randomization, and the repo is valuable precisely because it makes the mixed result on force reduction explicit.
Quick Facts
| Item | Value |
|---|---|
| Task | Peg insertion in simulation under domain randomization |
| Dataset | EXOKERN/contactbench-forge-peginsert-v0.1.1 |
| Simulator | NVIDIA Isaac Lab (Isaac Sim 4.5) |
| Robot | Franka FR3 |
| Architecture | TemporalUNet1D diffusion policy |
| Parameters | 71.3M |
| Observation horizon | 10 frames |
| Prediction / execution horizon | 16 / 8 actions |
| Seeds evaluated | 42, 123, 7 |
| Total rollouts reported | 600 |
Benchmark Summary
The Hub metadata for this repo tracks the primary full_ft checkpoint. The full repo includes the paired no_ft ablation for comparison.
| Checkpoint | Success Rate | Avg Contact Force (N) | Peak Contact Force (N) | Avg Episode Time (s) |
|---|---|---|---|---|
full_ft |
100.0 | 3.67 +/- 0.45 | 10.63 | 25.63 |
no_ft |
100.0 | 3.37 +/- 0.06 | 10.33 | 25.73 |
Figure: multi-seed benchmark summary built from the published eval_seed42/123/7.json artifacts.
Per-seed results:
| Seed | Condition | Success Rate | Avg Force (N) | Peak Force (N) | Avg Time (s) |
|---|---|---|---|---|---|
| 42 | full_ft |
100.0 | 3.24 | 10.44 | 25.61 |
| 42 | no_ft |
100.0 | 3.38 | 10.38 | 25.73 |
| 123 | full_ft |
100.0 | 4.12 | 10.57 | 25.74 |
| 123 | no_ft |
100.0 | 3.34 | 10.32 | 25.79 |
| 7 | full_ft |
100.0 | 3.69 | 10.93 | 25.54 |
| 7 | no_ft |
100.0 | 3.37 | 10.31 | 25.68 |
Interpretation:
- This release demonstrates robust task completion under a much harder collection regime than v0.
- On this particular peg-in-hole setup, domain randomization largely closed the force gap between
full_ftandno_ft. - That does not prove force/torque is unnecessary in general. It shows that this release is best used as a robust benchmark and an honest reference point for harder future tasks.
What Changed Compared To v0
| Topic | v0 | v0.1.1 |
|---|---|---|
| Dataset regime | Mostly fixed conditions | Multi-layer domain randomization |
| Dataset size | 2,221 episodes / 330,929 frames | 5,000 episodes / 745,000 frames |
| Robot | Franka Emika Panda | Franka FR3 |
| Force reduction takeaway | Clear F/T advantage | Inconclusive on this task |
| Best use | Clean baseline | Robustness benchmark |
Architecture
This release uses the same 1D Temporal U-Net diffusion policy family as v0.
| Component | Value |
|---|---|
| Action dimension | 7 |
| Observation dimensions | 22 (full_ft) / 16 (no_ft) |
| Diffusion training steps | 100 |
| DDIM inference steps | 16 |
| Base channels | 256 |
| Channel multipliers | (1, 2, 4) |
| Normalization | Min-max to [-1, 1] |
Repository Contents
| File | Description |
|---|---|
full_ft_best_model.pt |
Best checkpoint with force/torque input |
no_ft_best_model.pt |
Ablation checkpoint without force/torque input |
inference.py |
Self-contained inference helper and model definition |
config.yaml |
Training, dataset, and environment configuration |
eval_seed42.json |
Seed 42 evaluation artifact |
eval_seed123.json |
Seed 123 evaluation artifact |
eval_seed7.json |
Seed 7 evaluation artifact |
training_curve_full_ft_seed42.png |
Training curve for full_ft, seed 42 |
training_curve_full_ft_seed123.png |
Training curve for full_ft, seed 123 |
training_curve_full_ft_seed7.png |
Training curve for full_ft, seed 7 |
training_curve_no_ft_seed42.png |
Training curve for no_ft, seed 42 |
training_curve_no_ft_seed123.png |
Training curve for no_ft, seed 123 |
training_curve_no_ft_seed7.png |
Training curve for no_ft, seed 7 |
Usage
Reproduce evaluation with exokern-eval
pip install exokern-eval
wget https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1/resolve/main/full_ft_best_model.pt
exokern-eval \
--policy full_ft_best_model.pt \
--env Isaac-Forge-PegInsert-Direct-v0 \
--episodes 100
Load the repo helper locally
import os
import sys
from huggingface_hub import snapshot_download
repo_dir = snapshot_download(
repo_id="EXOKERN/skill-forge-peginsert-v0.1.1",
allow_patterns=["*.pt", "inference.py"],
)
sys.path.insert(0, repo_dir)
from inference import DiffusionPolicyInference
policy = DiffusionPolicyInference(
os.path.join(repo_dir, "full_ft_best_model.pt"),
device="cpu",
)
policy.add_observation([0.0] * 22)
actions = policy.get_actions()
print(len(actions))
Training And Evaluation Setup
| Item | Value |
|---|---|
| Train / val split | 85% / 15% by episode |
| Epochs | 300 |
| Batch size | 256 |
| Optimizer | AdamW, lr=1e-4, weight_decay=1e-4 |
| LR schedule | Cosine annealing to 1e-6 |
| EMA decay | 0.995 |
| Physics rate | 120 Hz |
| Control rate | 15 Hz |
| Domain randomization | Enabled in the training dataset |
Related Work
- FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty
- Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
- Factory: Fast Contact for Robotic Assembly
Citation
@misc{exokern_skill_peginsert_v011_2026,
title = {EXOKERN Skill v0.1.1: Robust Peg Insertion Under Domain Randomization},
author = {{EXOKERN}},
year = {2026},
howpublished = {\url{https://huggingface.co/EXOKERN/skill-forge-peginsert-v0.1.1}},
note = {Paired full_ft and no_ft diffusion-policy checkpoints}
}
Security Note
The checkpoints in this repo are PyTorch pickles. Load them only in a trusted or isolated environment after reviewing the repository contents.
Limitations
- Simulation only. This release does not claim real-robot readiness.
- Reported robustness is specific to the peg-in-hole task and the randomization ranges documented in the paired dataset card.
- The ablation result is mixed: use this repo to study robustness, not to overclaim a universal force/torque effect.
- The repo exposes paired checkpoints for research comparison; the intended production-style reference in this repo is
full_ft_best_model.pt.
Related Resources
- Dataset: EXOKERN/contactbench-forge-peginsert-v0.1.1
- Baseline predecessor: EXOKERN/skill-forge-peginsert-v0
- Evaluation CLI: github.com/Exokern/exokern_eval
- Organization page: huggingface.co/EXOKERN
- Downloads last month
- 3
Dataset used to train EXOKERN/skill-forge-peginsert-v0.1.1
Papers for EXOKERN/skill-forge-peginsert-v0.1.1
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Factory: Fast Contact for Robotic Assembly
Evaluation results
- Success Rate (%) on EXOKERN ContactBench v0.1.1self-reported100.000
- Average Contact Force (N) on EXOKERN ContactBench v0.1.1self-reported3.670
- Peak Contact Force (N) on EXOKERN ContactBench v0.1.1self-reported10.640

