ASAP: 驚くべきものを削減する、ファーストトークン・サプライザルによる効率的なLLM推論

Published：2026/1/8 14:09:36

ASAPでLLM爆速化！🚀✨

タイトル & 超要約: ASAPでLLMの推論を効率化！
ギャル的キラキラポイント✨
- ● CoT (Chain-of-Thought) の無駄（冗長性）を削減するよ！
- ● ファーストトークン (最初の単語) の「驚き」に着目👀
- ● 既存手法より性能UPで、計算コストも下がるって！
詳細解説
- 背景: LLM（大規模言語モデル）って、すごいんだけど計算コストが高いのが悩みだったよね💦 CoTって考え方で賢くなってるんだけど、文章が長くなりがちで困ってたの😩
- 方法: ASAPっていう新しいフレームワークを開発！問題と答えのペアを基準（アンカー）にして、CoTの無駄をカット✂️！あと、最初の単語の「驚き度」で、重要なステップを見極めるんだって🤔
- 結果: いろんなテストで、他の方法より良い結果が出たみたい💖 計算コストも減って、賢いのに速くなったってこと！最高じゃん！✨
- 意義（ここがヤバい♡ポイント）: LLMをもっと使いやすくする画期的な方法！🤖✨ いろんなサービスで、もっと早く、もっと安くLLMが使えるようになる可能性大！
リアルでの使いみちアイデア💡
- チャットボットが爆速＆お得に！🚀 回答が速くなって、料金も安くなるから、ユーザーも企業もハッピー🥰
- AI先生が誕生！？👩‍🏫✨ 難しい勉強も、サクサク進んで理解度UP！個別指導も夢じゃないかも！

続きは「らくらく論文」アプリで

Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal

Wenhao Zeng / Yaoning Wang / Chao Hu / Yuling Shi / Chengcheng Wan / Hongyu Zhang / Xiaodong Gu

Large Reasoning Models (LRMs) have demonstrated remarkable capabilities by scaling up the length of Chain-of-Thought (CoT). However, excessively long reasoning traces pose substantial challenges for training cost and inference latency. While various CoT compression approaches have emerged to address this challenge, they face inherent trade-offs: token-level methods often disrupt syntactic and logical coherence, while step-level methods based on perplexity fail to reliably capture the logically critical reasoning steps because of the dilution of logical information. In this paper, we propose ASAP (Anchor-guided, SurprisAl-based Pruning), a novel coarse-to-fine framework for CoT compression. ASAP first performs anchor-guided pruning to preserve the core reasoning structure, which efficiently reduces the search space for subsequent processing. Leveraging the insight that logical branching choices are concentrated at the onset of reasoning steps, it then enables logic-aware pruning by selecting logically essential reasoning steps based on a novel first-token surprisal metric. Finally, ASAP distills the models to autonomously generate and leverage these concise CoTs at inference time, enabling efficient reasoning. Experiments show that ASAP achieves state-of-the-art accuracy across multiple benchmarks while substantially reducing training and inference costs.

cs / cs.LG / cs.SE

Arxivで見る