LLM(大規模言語モデル)爆誕！計算効率と多様性を両立させる魔法🧙‍♀️✨

Published：2025/12/24 8:45:34

LLM(大規模言語モデル)爆誕！計算効率と多様性を両立させる魔法🧙‍♀️✨

超要約：LLMを賢くする新技！計算コスト削減＆表現力UPで、IT界がアゲ⤴︎

✨ ギャル的キラキラポイント ✨ ● 計算コスト(お財布への負担💰)を減らしつつ、賢さはそのままキープ！ ● テキスト(文章)のバリエーションが豊かになるから、飽きさせない文章生成！ ● IT企業が新しいサービスを爆速で作れるようになるってワケ💖

詳細解説 ● 背景 LLMはスゴイけど、計算が大変💸。賢くするには、高性能なパソコン(サーバー)が必要だったりする。IT企業は、LLMをもっと手軽に使いたいけど、コストは抑えたい…って悩んでたんだよね🥺

● 方法活性化関数(LLMの脳みそ)の組み合わせを工夫💡。ReLUとSILUっていう、それぞれ得意分野が違う関数を、確率的に切り替えて使うことにしたんだって！まるで、天才と秀才を状況に合わせて使い分けるみたい✨

続きは「らくらく論文」アプリで

Stochastic activations

Maria Lomeli / Matthijs Douze / Gergely Szilvasy / Loic Cabannes / Jade Copet / Sainbayar Sukhbaatar / Jason Weston / Gabriel Synnaeve / Pierre-Emmanuel Mazar\'e / Herv\'e J\'egou

We introduce stochastic activations. This novel strategy randomly selects between several non-linear functions in the feed-forward layer of a large language model. In particular, we choose between SILU or RELU depending on a Bernoulli draw. This strategy circumvents the optimization problem associated with RELU, namely, the constant shape for negative inputs that prevents the gradient flow. We leverage this strategy in two ways: (1) We use stochastic activations during pre-training and fine-tune the model with RELU, which is used at inference time to provide sparse latent vectors. This reduces the inference FLOPs and translates into a significant speedup on CPU and GPU. This leads to better results than training from scratch with the RELU activation function. (2) We evaluate stochastic activations for sequence generation. This strategy performs reasonably well: it has higher diversity and has only slightly inferior performance to the best deterministic non-linearity, SILU, combined with temperature sampling. This provides an alternative way to increase the diversity of generated text.

cs / cs.LG / cs.AI

Arxivで見る