LLMの解釈性UP！MoEモデルをギャル流に解剖しちゃうぞ💖

Published：2025/12/25 6:07:40

LLMの解釈性UP！MoEモデルをギャル流に解剖しちゃうぞ💖

超要約: MoEモデル（AIの頭脳🧠）をギャルでもわかるように解釈しやすくする研究だよ！

✨ ギャル的キラキラポイント ✨

● MoEモデルって、AI界のイケメン集団みたいなもん！専門家（エキスパート）が協力して問題解決するんだって😎 ● 解釈性UPで、AIがなんでそんなこと言ったのか、理由がわかるようになるの！まるで推しの本音暴露💖 ● 金融とか医療とか、AIの判断が大事な分野で大活躍！AIちゃんの信頼度爆上がりじゃん？

詳細解説いくよ～🎤

続きは「らくらく論文」アプリで

Sparsity and Superposition in Mixture of Experts

Marmik Chaudhari / Jeremi Nuer / Rome Thorstenson

Mixture of Experts (MoE) models have become central to scaling large language models, yet their mechanistic differences from dense networks remain poorly understood. Previous work has explored how dense models use \textit{superposition} to represent more features than dimensions, and how superposition is a function of feature sparsity and feature importance. MoE models cannot be explained mechanistically through the same lens. We find that neither feature sparsity nor feature importance cause discontinuous phase changes, and that network sparsity (the ratio of active to total experts) better characterizes MoEs. We develop new metrics for measuring superposition across experts. Our findings demonstrate that models with greater network sparsity exhibit greater \emph{monosemanticity}. We propose a new definition of expert specialization based on monosemantic feature representation rather than load balancing, showing that experts naturally organize around coherent feature combinations when initialized appropriately. These results suggest that network sparsity in MoEs may enable more interpretable models without sacrificing performance, challenging the common assumption that interpretability and capability are fundamentally at odds.

cs / cs.LG / cs.AI

Arxivで見る