タイトル & 超要約:CoRTでLLMの計算爆上げ!🎉
ギャル的キラキラポイント✨ ● LLM(大規模言語モデル)の計算問題を、Hint-Engineering(ヒント注入)で解決しちゃうとこ!✨ ● CoRTっていう新しいフレームワークで、LLMとコードインタープリタ(CI)の相性を爆上げしてる!🚀 ● 教育、金融、研究開発…いろんな分野でAIがもっと活躍できるようになるかも!😍
詳細解説 ● 背景 LLMはスゴイけど、計算がちょい苦手🥺 そこで、CI(Pythonとかのツール)と合体させて計算させよう!って試みがあるんだけど、LLMとCIの間で「あれ?🤔」ってなっちゃう問題があったの。
● 方法 CoRTは、Hint-Engineeringっていう裏ワザを使うよ! LLMの思考回路に、ヒントを注入することで、CIを賢く使えるようにする作戦! 具体的には、SFT(指示に沿った学習)、RFT(間違いを教える学習)、RL(ご褒美で学習)を組み合わせた、効率的なトレーニング!
● 結果 実験の結果、計算が速くなったり、間違えにくくなったんだって!🎉 さらに、数学だけじゃなく、化学の問題でも効果があったみたい! いろんな分野で使えるってコト!
続きは「らくらく論文」アプリで
Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose \emph{Hint-Engineering}, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRM-CI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning. CoRT further refines the multi-round interleaving of external CI usage and internal thinking by employing rejection sampling and reinforcement learning. Our experimental evaluations demonstrate CoRT's effectiveness, yielding absolute improvements of 4\% and 8\% on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B, respectively, across five challenging mathematical reasoning datasets. Moreover, CoRT significantly enhances efficiency, reducing token usage by approximately 30\% for the 32B model and 50\% for the 1.5B model compared to pure natural language reasoning baselines. The models and code are available at: https://github.com/ChengpengLi1003/CoRT.