LLMのSTEM力UP！失敗から学ぶ魔法✨

Published：2026/1/4 15:23:18

LLMのSTEM力UP！失敗から学ぶ魔法✨

超要約: LLMのSTEM能力、失敗分析で爆上げ！データとアルゴリズムの最強タッグ😎

✨ ギャル的キラキラポイント ✨ ● 失敗を恐れない！LLM、失敗事例から学びまくりだって！ ● STEM分野（科学・技術・工学・数学）に特化してるのがスゴくない？😍 ● データとアルゴリズムを一緒に考える「Data-algorithm co-design」が斬新！

詳細解説 ● 背景 LLM（大規模言語モデル）って、すごいけどSTEM分野はまだ発展途上だったのね💦 でも、SFTとかRLって方法で頑張ってきたんだって！今回の研究は、もっともっとSTEM分野を強化するぜ！

● 方法失敗を恐れないフレームワークを作るのがポイント！LLMが間違えたところを分析して、そこを重点的に学習するんだって！データも高品質なSTEMデータセットを用意して、データとアルゴリズムを一緒にデザインするらしい🤭

続きは「らくらく論文」アプリで

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

Mingyu Xu / Cheng Fang / Keyue Jiang / Yuqian Zheng / Yanghua Xiao / Baojian Zhou / Qifang Zhao / Suhang Zheng / Xiuwen Zhu / Jiyang Tang / Yongchi Zhao / Yijia Luo / Zhiqi Bai / Yuchi Xu / Wenbo Su / Wei Wang / Bing Zhao / Lin Qu / Xiaoxiao Xu

We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.

cs / cs.AI

Arxivで見る