高速LLM！Efficient-DLM爆誕🚀

Published：2025/12/16 4:12:17

タイトル: 高速LLM！Efficient-DLM爆誕🚀 超要約: ARの弱点克服！爆速dLMでビジネス加速！

ギャル的キラキラポイント✨ ● ARモデルって遅い問題を解決したの！✨ ● dLMを賢くして学習コストも下げた！🤩 ● 色んなビジネスに使える予感しかしない！💕

詳細解説 ● 背景文章作るAI、LLM（大規模言語モデル）ってすごいけど、作るのに時間かかったりするじゃん？😥 最新の研究は、そのLLMの動きをもっと早くしようって話！ARモデルっていう従来のやり方は、1個ずつ順番に文章作ってるから遅いんだよね😩

● 方法新しいモデルは、拡散（Diffusion）言語モデルっていうのを使ってて、これは並行して文章作れるから爆速になる可能性大✨ でも学習が大変だったりするんだけど、既存のARモデルをうまく使うことで、学習コストを抑えつつ高速化を実現したんだって！💖

● 結果研究で生まれたEfficient-DLMは、精度を保ちつつ、文章を作るスピードがめっちゃ上がったみたい！✨ 今までのモデルよりも、ずっと賢く、早く動けるってこと😍 これで、色んなサービスがもっと使いやすくなるかも！

続きは「らくらく論文」アプリで

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Yonggan Fu / Lexington Whalen / Zhifan Ye / Xin Dong / Shizhe Diao / Jingyu Liu / Chengyue Wu / Hao Zhang / Enze Xie / Song Han / Maksim Khadkevich / Jan Kautz / Yingyan Celine Lin / Pavlo Molchanov

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To this end, we study AR-to-dLM conversion to transform pretrained AR models into efficient dLMs that excel in speed while preserving AR models' task accuracy. We achieve this by identifying limitations in the attention patterns and objectives of existing AR-to-dLM methods and then proposing principles and methodologies for more effective AR-to-dLM conversion. Specifically, we first systematically compare different attention patterns and find that maintaining pretrained AR weight distributions is critical for effective AR-to-dLM conversion. As such, we introduce a continuous pretraining scheme with a block-wise attention pattern, which remains causal across blocks while enabling bidirectional modeling within each block. We find that this approach can better preserve pretrained AR models' weight distributions than fully bidirectional modeling, in addition to its known benefit of enabling KV caching, and leads to a win-win in accuracy and efficiency. Second, to mitigate the training-test gap in mask token distributions (uniform vs. highly left-to-right), we propose a position-dependent token masking strategy that assigns higher masking probabilities to later tokens during training to better mimic test-time behavior. Leveraging this framework, we conduct extensive studies of dLMs' attention patterns, training dynamics, and other design choices, providing actionable insights into scalable AR-to-dLM conversion. These studies lead to the Efficient-DLM family, which outperforms state-of-the-art AR models and dLMs, e.g., our Efficient-DLM 8B achieves +5.4%/+2.7% higher accuracy with 4.5x/2.7x higher throughput compared to Dream 7B and Qwen3 4B, respectively.

cs / cs.CL / cs.AI / cs.LG

Arxivで見る