LLM-in-Sandbox
Collection
Data and models for the paper: Computer Environments Elicit General Agentic Intelli. Feel free to open an issue if you have any questions or problems! • 3 items • Updated • 1
This is the model checkpoint trained with LLM-in-Sandbox-RL from our paper: Computer Environments Elicit General Agentic Intelligence in LLMs. The base model is Qwen/Qwen3-4B-Instruct-2507. The training data is available at llm-in-sandbox-rl dataset and the training code is at llm-in-sandbox-rl code.
vllm serve daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL \
--served-model-name qwen3-4b-instruct-sandbox-rl \
--enable-prefix-caching \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser hermes
Please refer to our RL training code for reproducing this checkpoint and our inference code to use this model for LLM-in-Sandbox inference and reproduce our paper results.
If you find our work helpful, please cite us:
@article{cheng2026llm,
title={Llm-in-sandbox elicits general agentic intelligence},
author={Cheng, Daixuan and Huang, Shaohan and Gu, Yuxian and Song, Huatong and Chen, Guoxin and Dong, Li and Zhao, Wayne Xin and Wen, Ji-Rong and Wei, Furu},
journal={arXiv preprint arXiv:2601.16206},
year={2026}
}
Base model
Qwen/Qwen3-4B-Instruct-2507