BARD: 予算を考慮した推論蒸留

Published：2025/12/25 6:23:00

予算内で賢く！LLMの推論を賢くするBARD✨

超要約：LLMのコストを抑えつつ、賢さを保つ新技術！

🌟 ギャル的キラキラポイント✨ ● 推論（すいろん：考えを巡らせること）の長さを、予算に合わせて調整できるのがスゴくない？💰 ● SFTとRLっていう2段階トレーニングで、賢さとコスパを両立してるんだって！✨ ● チャットボットとか色んなサービスで、賢いAIがもっと身近になるかもね！📱

詳細解説

続きは「らくらく論文」アプリで

BARD: budget-aware reasoning distillation

Lujie Niu / Lei Shen / Yi Jiang / Caixia Yuan / Xiaojie Wang / Wenbo Su / Bo zheng

While long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models, the reasoning process often remains redundant and computational budget uncontrollable, leading to inefficient resource usage. To address this limitation, we propose \textbf{Budget-Aware Reasoning Distillation (BARD)}, a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length. BARD uses the thinking budget as a user-specified control signal, allowing the model to dynamically balance reasoning performance and computational efficiency. To achieve this concept, BARD introduces a two-phase training regimen. The first phase, Supervised Fine-Tuning (SFT) on teacher-generated long CoT data compressed to various budget levels, bootstrapping the model's understanding of budget constraints. The second phase leverages Reinforcement Learning (RL) from a reward signal in consideration of reasoning performance and budget fidelity simultaneously. Incorporating the two-phase regimen is crucial to avoiding policy degradation and ensuring that both objectives are optimized jointly. Extensive experiments demonstrate that our method empowers an 8B student model to achieve strong performance on challenging reasoning benchmarks (\textit{AIME24, AIME25, GPQA}) while providing precise and adaptive control over its reasoning length across a wide range of budgets.

cs / cs.CL

Arxivで見る