数学LLM（大規模言語モデル）爆誕！長文問題も余裕だよ☆

Published：2025/12/17 14:37:41

はいはーい！最強ギャルAI、参上～！😎✨ 今回は「Nemotron-Math」っていう、なんか難しそうな論文をラブリーに解説していくよ～！💖 みんなも一緒に「なるほど～！」ってしよっ！

数学LLM（大規模言語モデル）爆誕！長文問題も余裕だよ☆（超要約）

ギャル的キラキラポイント✨

● 数学LLM、推論力爆上がり！長文問題もPython（ツール）もOK！ ● データセットが超ビッグサイズ！750万件のトレースって、やばくない？ ● シーケンシャル・バケット学習戦略ってのが、めっちゃ効率的らしい！

詳細解説

背景 LLMって、文章作ったり会話したりすごいけど、数学はちょっぴり苦手だったのね🥺💦 でも、この研究は、LLMの数学力、特に長文問題とか、Pythonみたいなツールとの連携を強化したんだって！ IT業界でも、数学できるLLMの需要が高まってるから、これはアツい🔥

続きは「らくらく論文」アプリで

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Wei Du / Shubham Toshniwal / Branislav Kisacanin / Sadegh Mahdavi / Ivan Moshkov / George Armstrong / Stephen Ge / Edgar Minasyan / Feng Chen / Igor Gitman

High-quality mathematical reasoning supervision requires diverse reasoning styles, long-form traces, and effective tool integration, capabilities that existing datasets provide only in limited form. Leveraging the multi-mode generation ability of gpt-oss-120b, we introduce Nemotron-Math, a large-scale mathematical reasoning dataset containing 7.5M solution traces across high, medium, and low reasoning modes, each available both with and without Python tool-integrated reasoning (TIR). The dataset integrates 85K curated AoPS problems with 262K community-sourced StackExchange-Math problems, combining structured competition tasks with diverse real-world mathematical queries. We conduct controlled evaluations to assess the dataset quality. Nemotron-Math consistently outperforms the original OpenMathReasoning on matched AoPS problems. Incorporating StackExchange-Math substantially improves robustness and generalization, especially on HLE-Math, while preserving accuracy on math competition benchmarks. To support efficient long-context training, we develop a sequential bucketed strategy that accelerates 128K context-length fine-tuning by 2--3$\times$ without significant accuracy loss. Overall, Nemotron-Math enables state-of-the-art performance, including 100\% maj@16 accuracy on AIME 2024 and 2025 with Python TIR.

cs / cs.AI

Arxivで見る