超要約: MoEモデルの多様性UP!幾何学的正則化の効果を研究✨
💎 ギャル的キラキラポイント✨
● MoEモデル(エキスパートを組み合わせるモデル)をもっと賢くする方法を研究してるんだって!賢いって最高💖 ● 重みの直交性(専門家が似たようなことしないようにする工夫)が、本当に効果あるのか調べた結果が出てるみたい🧐 ● 幾何学的正則化(専門家のバラバラ度合いを調整する方法)の限界が分かって、もっといい方法が見つかるかも!期待しかない🙌
続きは「らくらく論文」アプリで
Mixture-of-Experts (MoE) models achieve efficiency through sparse activation, but the role of geometric regularization in expert specialization remains unclear. We apply orthogonality loss to enforce expert diversity and find it fails on multiple fronts: it does not reduce weight-space overlap (MSO actually increases by up to 114%), activation-space overlap remains high (~0.6) regardless of regularization, and effects on performance are inconsistent -- marginal improvement on WikiText-103 (-0.9%), slight degradation on TinyStories (+0.9%), and highly variable results on PTB (std > 1.0). Our analysis across 7 regularization strengths reveals no significant correlation (r = -0.293, p = 0.523) between weight and activation orthogonality. These findings demonstrate that weight-space regularization neither achieves its geometric goal nor reliably improves performance, making it unsuitable for MoE diversity.