LLMの数学力UP！データ合成戦略のヒミツ💖

Published：2025/8/22 16:37:40

LLMの数学力UP！データ合成戦略のヒミツ💖

超要約: LLMの算数（すうがく）力アップを目指して、色んなデータ合成方法を試して、一番イケてる方法を見つけちゃったって話✨
ギャル的キラキラポイント✨
- ● LLMの算数力を上げるために、色んなデータ合成（データを人工的に作る）方法を試したこと！まるでファッション誌みたいに、色んなコーデを試着する感じ？👗
- ● 論文で一番イケてるデータ合成方法を見つけたこと！まるで、運命のコスメに出会ったみたい💖 これでLLMの算数力も爆上がり！
- ● 算数力アップしたLLMで、色んなサービス（教育とか金融とか）がもっとスゴくなるかも！将来が楽しみだね🥰
詳細解説
- 背景: LLM（大規模言語モデル）の算数能力を上げるのって、めっちゃ大事！でも、人間が作ったデータだけじゃ限界があるから、合成データ（人工的なデータ）で頑張ることにしたんだって✨ 色んな合成方法があるから、どれが一番効果的なのか調べることにしたみたい！
- 方法: いろんなデータ合成の方法を試したよ！論文では「FLAMES」っていうフレームワークを使って、データ合成のやり方を変えたり、データの質を良くしたりして、LLMの算数力がどう変わるか実験したんだって💖 10種類以上の合成方法を試したみたい！すごい！
- 結果: 色んな実験の結果、一番LLMの算数力が上がる合成方法が見つかったんだって！具体的な方法は論文を見てね👀✨ その方法を使うと、LLMが難しい算数の問題も解けるようになるみたい！
- 意義（ここがヤバい♡ポイント）: この研究のおかげで、LLMの算数力が爆上がりする可能性があるってこと！算数力があがれば、AI家庭教師とか、金融のリスク分析とか、色んなことに役立つじゃん？✨ LLMがもっと賢くなって、私たちの生活をサポートしてくれる未来が楽しみだね！
リアルでの使いみちアイデア💡
- AI家庭教師に、もっと難しい問題を出してもらえるようになるかも！✨ 自分のレベルに合わせてくれるから、勉強もはかどるね！
- 金融のプロが使ってるような、すごい分析ツールが作れるかも！✨ 投資とか、もっと上手になるかもね！

続きは「らくらく論文」アプリで

FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline

Parker Seegmiller / Kartik Mehta / Soumya Saha / Chenyang Tao / Shereen Oraby / Arpit Gupta / Tagyoung Chung / Mohit Bansal / Nanyun Peng

Recent works improving LLM math reasoning with synthetic data have used unique setups, making comparison of data synthesis strategies impractical. This leaves many unanswered questions about the roles of different factors in the synthetic data pipeline, such as the impact of filtering low-quality problems. To address this gap, we introduce FLAMES, a Framework for LLM Assessment of Math rEasoning Data Synthesis, and perform a systematic study of 10 existing data synthesis strategies and multiple other factors impacting the performance of synthetic math reasoning data. Our FLAMES experiments provide several valuable insights about the optimal balance of difficulty and diversity of synthetic data. First, data agents designed to increase problem complexity lead to best improvements on most math metrics. Second, with a fixed data generation budget, keeping higher problem coverage is more important than keeping only problems with reliable solutions. Third, GSM8K- and MATH-based synthetic data can lead to improvements on competition-level benchmarks, showcasing easy-to-hard generalization. Leveraging insights from our FLAMES experiments, we design two novel data synthesis strategies for improving out-of-domain generalization and robustness. Further, we develop the FLAMES dataset, an effective blend of our novel and existing data synthesis strategies, outperforming public datasets on OlympiadBench (+15.7), CollegeMath (+4.5), GSMPlus (+6.5), and MATH (+3.1). Fine-tuning Qwen2.5-Math-7B on the FLAMES dataset achieves 81.4% on MATH, surpassing larger Llama3 405B, GPT-4o and Claude 3.5 Sonnet.

cs / cs.LG / cs.AI / cs.CL

Arxivで見る