Published：2026/1/11 14:07:30

TREEPS-RAG爆誕！エージェント型RAGを最強にする木🌲！

超要約: LLMのRAGを、ツリー構造で超進化✨ステップごとの学習を促すよ！
ギャル的キラキラポイント✨
- ● エージェント型RAGの学習が、ツリー構造でめっちゃ効率UP💖
- ● モンテカルロ法で、中間ステップの評価もバッチリ👌
- ● 類似度プルーニングで、計算コストも賢く節約💰
詳細解説
- 背景: LLMを使った検索拡張生成（RAG）って、外部の情報源から情報を引っ張ってきて、賢く答える技術のこと💡 でも、従来のやり方だと、最終的な答えの良し悪しでしか評価されなくて、学習がイマイチだったの😭
- 方法: そこで、推論プロセスをツリーにしちゃった！各ステップをツリーの枝に見立てて、モンテカルロ法でそれぞれのステップがどれだけイケてるか評価する作戦😎✨ 類似度プルーニングで、無駄な計算もカット！
- 結果: この方法で、中間ステップの改善が促されて、RAGの性能が爆上がりしたってわけ🎉 回答の精度も信頼性もアップしたってこと！
- 意義: これってつまり、LLMがもっと賢く、正確に答えられるようになるってこと！検索エンジンの精度UPにも繋がるし、AIチャットボットももっと優秀になるかも😍
リアルでの使いみちアイデア💡
- 💡 企業内の情報検索システムに導入！社員が知りたい情報に、秒でたどり着けるように🚀
- 💡 AIアシスタントに搭載！複雑な質問にも、完璧に答えてくれる相棒になるかも🤭

続きは「らくらく論文」アプリで

TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG

Tianhua Zhang / Kun Li / Junan Li / Yunxiang Li / Hongyin Luo / Xixin Wu / James Glass / Helen Meng

Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval, and has recently been advanced by reinforcement learning (RL) with outcome-based supervision. While effective, relying solely on sparse final rewards limits step-wise credit assignment and provides weak guidance for intermediate reasoning and actions. Recent efforts explore process-level supervision, but typically depend on offline constructed training data, which risks distribution shift, or require costly intermediate annotations. We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining standard outcome-only rewards. Our key insight is to model agentic RAG reasoning as a rollout tree, where each reasoning step naturally maps to a node. This tree structure allows step utility to be estimated via Monte Carlo estimation over its descendant outcomes, yielding fine-grained process advantages without requiring intermediate labels. To make this paradigm practical, we introduce an efficient online tree construction strategy that preserves exploration diversity under a constrained computational budget. With a rollout cost comparable to strong baselines like Search-R1, experiments on seven multi-hop and general QA benchmarks across multiple model scales show that TreePS-RAG consistently and significantly outperforms both outcome-supervised and leading process-supervised RL methods.

cs / cs.CL

Arxivで見る