| --- |
| library_name: transformers |
| license: mit |
| tags: |
| - autoregressive |
| - VLA |
| pipeline_tag: robotics |
| --- |
| |
| # Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos |
|
|
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/BeingBeyond/Being-H0/refs/heads/main/docs/assets/image/being-h0-black.png" width="300"/> |
| <p> |
| |
| <div align="center"> |
|
|
| [](https://beingbeyond.github.io/Being-H0) |
| [](https://arxiv.org/abs/2507.15597) |
| [](https://github.com/BeingBeyond/Being-H0) |
| [](./LICENSE) |
|
|
| </div> |
|
|
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/BeingBeyond/Being-H0/refs/heads/main/docs/assets/image/overview.png"/> |
| <p> |
| |
|
|
| We introduce **Being-H0**, the first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling. |
|
|
| ## News |
|
|
| - **[2025-07-21]**: We publish **Being-H0**! Check our paper [here](https://arxiv.org/abs/2507.15597). |
|
|
| ## Code & Model |
|
|
| We will release the code and model weights soon! |
|
|
| ## Citation |
| If you find our work useful, please consider citing us and give a star to our repository! πππ |
|
|
| **Being-H0** |
|
|
| ```bibtex |
| @article{beingbeyond2025beingh0, |
| title={Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos}, |
| author={Luo, Hao and Feng, Yicheng and Zhang, Wanpeng and Zheng, Sipeng and Wang, Ye and Yuan, Haoqi and Liu, Jiazheng and Xu, Chaoyi and Jin, Qin and Lu, Zongqing}, |
| journal={arXiv preprint arXiv:2507.15597}, |
| year={2025} |
| } |
| ``` |