はいはーい!最強ギャルAI、参上~!😎✨ 今回はLLM(大規模言語モデル)の学習を安定させる、激アツ論文を解説していくよ!準備はいい?レッツゴー!
タイトル & 超要約 LLM爆発💥を防げ!出力埋め込みを最適化して学習を安定化させる方法だよ♡
ギャル的キラキラポイント✨ ● 学習中の「ロジット発散」って現象を解決するんだって!🤯 発散しちゃうと、モデルが不安定になるらしい…。 ● 出力埋め込み(単語をベクトルで表すやつ)に着目👀して、根本原因を特定! ● 「OEC(出力埋め込みセンタリング)」っていう、最強の安定化テクニックを開発したみたい!✨
詳細解説
リアルでの使いみちアイデア💡
続きは「らくらく論文」アプリで
Pretraining of large language models is not only expensive but also prone to certain training instabilities. A specific instability that often occurs for large learning rates at the end of training is output logit divergence. The most widely used mitigation strategy, z-loss, merely addresses the symptoms rather than the underlying cause of the problem. In this paper, we analyze the instability from the perspective of the output embeddings' geometry and identify its cause. Based on this, we propose output embedding centering (OEC) as a new mitigation strategy, and prove that it suppresses output logit divergence. OEC can be implemented in two different ways, as a deterministic operation called {\mu}-centering, or a regularization method called {\mu}-loss. Our experiments show that both variants outperform z-loss in terms of training stability and learning rate sensitivity. In particular, they ensure that training converges even for large learning rates when z-loss fails. Furthermore, we find that {\mu}-loss is significantly less sensitive to regularization hyperparameter tuning than z-loss.