Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Weixun Wang / XiaoXiao Xu / Wanhe An / Fangwen Dai / Wei Gao / Yancheng He / Ju Huang / Qiang Ji / Hanqi Jin / Xiaoyang Li / Yang Li / Zhongwen Li / Shirong Lin / Jiashun Liu / Zenan Liu / Tao Luo / Dilxat Muhtar / Yuanbin Qu / Jiaqiang Shi / Qinghui Sun / Yingshui Tan / Hao Tang / Runze Wang / Yi Wang / Zhaoguo Wang / Yanan Wu / Shaopan Xiong / Binchen Xu / Xander Xu / Yuchi Xu / Qipeng Zhang / Xixia Zhang / Haizhou Zhao / Jie Zhao / Shuaibing Zhao / Baihui Zheng / Jianhui Zheng / Suhang Zheng / Yanni Zhu / Mengze Cai / Kerui Cao / Xitong Chen / Yue Dai / Lifan Du / Tao Feng / Tao He / Jin Hu / Yijie Hu / Ziyu Jiang / Cheng Li / Xiang Li / Jing Liang / Xin Lin / Chonghuan Liu / ZhenDong Liu / Zhiqiang Lv / Haodong Mi / Yanhu Mo / Junjia Ni / Shixin Pei / Jingyu Shen / XiaoShuai Song / Cecilia Wang / Chaofan Wang / Kangyu Wang / Pei Wang / Tao Wang / Wei Wang / Ke Xiao / Mingyu Xu / Tiange Xu / Nan Ya / Siran Yang / Jianan Ye / Yaxing Zang / Duo Zhang / Junbo Zhang / Boren Zheng / Wanxi Deng / Ling Pan / Lin Qu / Wenbo Su / Jiamang Wang / Wei Wang / Hu Wei / Minggang Wu / Cheng Yu / Bing Zhao / Zhicheng Zheng / Bo Zheng
Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.