GR-1 Humanoid Data Ecosystem Deep Review: From Real Teleoperation to Million-Scale Synthetic Trajectories
The GR-1 humanoid robot from Fourier Intelligence has become one of the most important platforms for embodied AI research. Three major datasets now exist around it, forming a data ecosystem spanning real-world teleoperation to million-scale synthetic generation.
This article provides a data engineer's perspective on all three GR-1 datasets, with side-by-side comparisons of scale, modality, licensing, and training results.
The Three Datasets
- Fourier ActionNet: 30K+ real teleoperation trajectories, CC BY-NC-SA 4.0
- NVIDIA GR-1 Simulation: Arena (50), Teleop-Sim (1,000), X-Embodiment (TB-scale)
- GR00T N1 Training Set: 780K synthetic + real + internet video, Apache 2.0
Key Findings
- GR00T N1 achieves 42.6% success with only 10% data, 76.8% with full data on real GR-1
- Synthetic data alone reaches 46.4% — real-world data remains essential for Sim-to-Real
- Full-body locomotion data is still not publicly available for GR-1
- License fragmentation across the three datasets requires careful commercial review
Selection Guide
| Goal | Recommended Dataset |
|---|---|
| Quick start (academic) | Fourier ActionNet |
| Train VLA foundation model | GR00T N1 Training Set |
| Sim-to-Real research | NVIDIA GR-1 Sim + ActionNet combo |
| Dexterous hand / bimanual | Fourier ActionNet |