SCRIBEってすごい！LLMのツール使いを爆上げする研究だよ☆

Published：2026/1/7 3:49:48

超要約：LLMがツールを使いこなすのを、賢く応援するフレームワークの研究だよ💕

✨ ギャル的キラキラポイント ✨ ● 3層構造で、LLMの思考を整理するんだって！まるで賢いギャルみたい♪ ● スキルプロトタイプ（スキル見本）で、LLMの評価が超安定するらしい！ ● ミドルレベルの抽象化ってのが、なんか響く～！理解度爆上がりじゃん？

詳細解説いくねー！

背景 LLM（大規模言語モデル）に色んなツールを使わせたら、もっと色んなことできるようになるじゃん？例えば、検索したり、コード書いたり…でも、LLMが賢くツールを使うのって、結構難しいんだよね😢 どんなツール選べばいいかとか、結果をどう組み合わせればいいかとか、ミスしがち💦

続きは「らくらく論文」アプリで

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models

Yuxuan Jiang / Francis Ferraro

Training reliable tool-augmented agents remains a significant challenge, largely due to the difficulty of credit assignment in multi-step reasoning. While process-level reward models offer a promising direction, existing LLM-based judges often produce noisy and inconsistent signals because they lack fine-grained, task-specific rubrics to distinguish high-level planning from low-level execution. In this work, we introduce SCRIBE (Skill-Conditioned Reward with Intermediate Behavioral Evaluation), a reinforcement learning framework that intervenes at a novel mid-level abstraction. SCRIBE grounds reward modeling in a curated library of skill prototypes, transforming open-ended LLM evaluation into a constrained verification problem. By routing each subgoal to a corresponding prototype, the reward model is equipped with precise, structured rubrics that substantially reduce reward variance. Experimental results show that SCRIBE achieves state-of-the-art performance across a range of reasoning and tool-use benchmarks. In particular, it improves the AIME25 accuracy of a Qwen3-4B model from 43.3% to 63.3%, and significantly increases success rates in complex multi-turn tool interactions. Further analysis of training dynamics reveals a co-evolution across abstraction levels, where mastery of mid-level skills consistently precedes the emergence of effective high-level planning behaviors. Finally, we demonstrate that SCRIBE is additive to low-level tool optimizations, providing a scalable and complementary pathway toward more autonomous and reliable tool-using agents.

cs / cs.AI

Arxivで見る