Update README.md
Browse files
README.md
CHANGED
|
@@ -12,41 +12,51 @@ tags:
|
|
| 12 |
- text-to-motion
|
| 13 |
---
|
| 14 |
|
| 15 |
-
|
| 16 |
-
<h1>FRoM-W1 (机智-W1): Towards General Humanoid Whole-Body Control with Language Instructions</h1>
|
| 17 |
-
</div>
|
| 18 |
|
| 19 |
<div align="center">
|
| 20 |
-
|
|
|
|
|
|
|
| 21 |
</div>
|
| 22 |
|
| 23 |
<div align="center">
|
| 24 |
-
<a href="https://github.com/OpenMOSS/FRoM-W1">💻Github</a>
|
| 25 |
</div>
|
| 26 |
|
| 27 |
-
|
| 28 |
## Introduction
|
| 29 |
<div align="center">
|
| 30 |
-
<img src="
|
| 31 |
</div>
|
| 32 |
|
| 33 |
|
| 34 |
Humanoid robots are capable of performing various actions such as greeting, dancing, and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility.
|
| 35 |
-
In this work, we present **FRoM-W1**, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.
|
| 36 |
To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, **FRoM-W1** operates in two stages:
|
| 37 |
-
(a) **H-GPT**:
|
| 38 |
We further leverage the Chain-of-Thought technique to improve the model’s generalization in instruction understanding.
|
| 39 |
(b) **H-ACT**: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions.
|
| 40 |
It is then deployed on real robots via a modular simulation-to-reality module.
|
| 41 |
We extensively evaluate our framework on the Unitree H1 and G1 robots, demonstrating successful language-to-motion generation and stable execution in both simulation and real-world settings.
|
| 42 |
We fully open-source the entire **FRoM-W1** framework and hope it will advance the development of humanoid intelligence.
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
|
|
|
|
| 47 |
|
| 48 |
<div align="center">
|
| 49 |
-
<img src="
|
| 50 |
</div>
|
| 51 |
|
| 52 |
The complete **FRoM-W1** workflow is illustrated above:
|
|
@@ -55,10 +65,18 @@ The complete **FRoM-W1** workflow is illustrated above:
|
|
| 55 |
Deploy **H-GPT** via command-line tools or a web interface to convert natural-language commands into human motion representations.
|
| 56 |
This module provides full training, inference, and evaluation code, and pretrained models are available on HuggingFace.
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
- **H-ACT**
|
| 59 |
**H-ACT** converts the motion representations from H-GPT into SMPL-X motion sequences and further retargets them to various humanoid robots.
|
| 60 |
The resulting motions can be used both for training control policies and executing actions on real robots using our deployment pipeline.
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
## Citation
|
| 63 |
If you find our work useful, please cite it for now in the following way:
|
| 64 |
```bibtex
|
|
|
|
| 12 |
- text-to-motion
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
|
|
|
|
|
|
|
| 16 |
|
| 17 |
<div align="center">
|
| 18 |
+
<img src="./assets/hi_logo.png" alt="FRoM-W1" width="7.5%">
|
| 19 |
+
|
| 20 |
+
The **H**umanoid **I**ntelligence Team from FudanNLP and OpenMOSS
|
| 21 |
</div>
|
| 22 |
|
| 23 |
<div align="center">
|
| 24 |
+
<a href="https://github.com/OpenMOSS/FRoM-W1">💻Github</a> <a href="https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets">🤗Datasets</a> <a href="https://huggingface.co/OpenMOSS-Team/FRoM-W1">🤗Models</a>
|
| 25 |
</div>
|
| 26 |
|
|
|
|
| 27 |
## Introduction
|
| 28 |
<div align="center">
|
| 29 |
+
<img src="./assets/FRoM-W1-Teaser.png" alt="FRoM-W1" width="50%">
|
| 30 |
</div>
|
| 31 |
|
| 32 |
|
| 33 |
Humanoid robots are capable of performing various actions such as greeting, dancing, and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility.
|
| 34 |
+
In this work, we present **FRoM-W1[^1]**, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.
|
| 35 |
To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, **FRoM-W1** operates in two stages:
|
| 36 |
+
(a) **H-GPT**: Utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors.
|
| 37 |
We further leverage the Chain-of-Thought technique to improve the model’s generalization in instruction understanding.
|
| 38 |
(b) **H-ACT**: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions.
|
| 39 |
It is then deployed on real robots via a modular simulation-to-reality module.
|
| 40 |
We extensively evaluate our framework on the Unitree H1 and G1 robots, demonstrating successful language-to-motion generation and stable execution in both simulation and real-world settings.
|
| 41 |
We fully open-source the entire **FRoM-W1** framework and hope it will advance the development of humanoid intelligence.
|
| 42 |
|
| 43 |
+
[^1]: **F**oundational Humanoid **Ro**bot **M**odel - **W**hole-Body Control, Version **1**
|
| 44 |
+
|
| 45 |
+
## Release Timeline
|
| 46 |
+
We will gradually release the paper, data, codebase, model checkpoints, and the real-robot deployment framework for **FRoM-W1** in the next week or two.
|
| 47 |
+
|
| 48 |
+
Here is the current release progress:
|
| 49 |
+
- [**2025/12/14**] We have released the **CoT data** of HumanML3D-X on **[HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets)**.
|
| 50 |
+
- [**2025/12/13**] We have uploaded the checkpoints for HGPT, Baselines (SMPL-X version of T2M, MotionDiffuse, MLD, T2M-GPT), and the SMPL-X Motion Generation eval model on **[HuggingFace Models](https://huggingface.co/OpenMOSS-Team/FRoM-W1)**.
|
| 51 |
+
- [**2025/12/10**] We have uploaded the initial version of the code for two core modules, **[H-GPT](./H-GPT/README.md)** and **[H-ACT](./H-ACT/README.md)** !
|
| 52 |
+
- [**2025/12/10**] We have released our lightweight, modular humanoid-robot deployment framework [**RoboJuDo**](https://github.com/HansZ8/RoboJuDo)!
|
| 53 |
+
- [**2025/12/10**] We are thrilled to initiate the release of **FRoM-W1**!
|
| 54 |
|
| 55 |
|
| 56 |
+
## Usage
|
| 57 |
|
| 58 |
<div align="center">
|
| 59 |
+
<img src="./assets/FRoM-W1-Overview.png" alt="overview" width="80%">
|
| 60 |
</div>
|
| 61 |
|
| 62 |
The complete **FRoM-W1** workflow is illustrated above:
|
|
|
|
| 65 |
Deploy **H-GPT** via command-line tools or a web interface to convert natural-language commands into human motion representations.
|
| 66 |
This module provides full training, inference, and evaluation code, and pretrained models are available on HuggingFace.
|
| 67 |
|
| 68 |
+
<div align="center">
|
| 69 |
+
<img src="./assets/FRoM-W1-HGPT.png" alt="fromw1-hgpt" width="80%">
|
| 70 |
+
</div>
|
| 71 |
+
|
| 72 |
- **H-ACT**
|
| 73 |
**H-ACT** converts the motion representations from H-GPT into SMPL-X motion sequences and further retargets them to various humanoid robots.
|
| 74 |
The resulting motions can be used both for training control policies and executing actions on real robots using our deployment pipeline.
|
| 75 |
|
| 76 |
+
<div align="center">
|
| 77 |
+
<img src="./assets/FRoM-W1-HACT.png" alt="fromw1-hact" width="80%">
|
| 78 |
+
</div>
|
| 79 |
+
|
| 80 |
## Citation
|
| 81 |
If you find our work useful, please cite it for now in the following way:
|
| 82 |
```bibtex
|