iconLogo
Published:2025/8/22 17:27:09

LLMの未来を予測!タスク実行能力を爆速で推定する魔法🧙‍♀️

LLM(大規模言語モデル)の性能を、計算コストを抑えつつ予測する新しい方法を見つけちゃったって話だよ!タスクの実行精度を上げるために、モデルのサイズと学習データの量からパフォーマンスを予測するんだって✨

✨ ギャル的キラキラポイント ✨ ● 2段階予測で、タスクごとの精度をピンポイントで予測🎯 ● まるで階段!「モデルラダー」で、色んなサイズを試せる🪜 ● Overtraining(過学習)してても、ちゃんと予測できる賢さ!💖

詳細解説いくよ~!

背景 LLMって、学習するのにめちゃくちゃお金がかかる💸 モデルサイズとか、学習データ量とか、色々試したいけど、全部やってたら破産しちゃう😭 そこで、少ない情報から「このモデル、どれくらいできるかな?」って予測する技術が必要になったわけ!

続きは「らくらく論文」アプリで

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia / Jiacheng Liu / Alexander Wettig / David Heineman / Oyvind Tafjord / Ananya Harsh Jha / Luca Soldaini / Noah A. Smith / Dirk Groeneveld / Pang Wei Koh / Jesse Dodge / Hannaneh Hajishirzi

We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: (1) use model and data size to predict an intermediate loss, then (2) use it to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks formatted as ranked classification, we can predict the accuracy of both target models within 2 points of absolute error. We find that tasks with higher prediction error also have higher variance in the metrics over model checkpoints. We also contrast multiple design choices for predicting accuracy, and present recommendations for extending our method to new models and tasks.

cs / cs.CL / cs.AI