1. データ不要でLLM爆誕! LSPの衝撃🔥
2. ギャル的キラキラポイント✨ ● 大量のデータ集めは過去の話! LSPならデータなしでOK! ● LLMが自分で学ぶから、専門データもラクラクget! ● IT業界の未来を変える、超スゴい技術ってコト!
3. 詳細解説
4. リアルでの使いみちアイデア💡 💡 専門知識を持ったチャットボットを、データなしで爆速で作れる! 💡 企業が、高品質なコンテンツを簡単に生成できるようになるかも!
続きは「らくらく論文」アプリで
Large language models (LLMs) have advanced rapidly in recent years, driven by scale, abundant high-quality training data, and reinforcement learning. Yet this progress faces a fundamental bottleneck: the need for ever more data from which models can continue to learn. In this work, we propose a reinforcement learning approach that removes this dependency by enabling models to improve without additional data. Our method leverages a game-theoretic framework of self-play, where a model's capabilities are cast as performance in a competitive game and stronger policies emerge by having the model play against itself-a process we call Language Self-Play (LSP). Experiments with Llama-3.2-3B-Instruct on instruction-following, mathematics, and coding benchmarks show that pretrained models can be effectively improved with self-play alone.