コスパ最強！LLM育成術「MoE-DisCo」で、みんなもAIマスター💅💕

Published：2026/1/11 10:59:15

コスパ最強！LLM育成術「MoE-DisCo」で、みんなもAIマスター💅💕

超要約：低コストでLLM（大規模言語モデル）作る方法だよ☆

✨ ギャル的キラキラポイント ✨ ● 高価なGPU（グラフィックボード）じゃなくても、賢いAIちゃんが作れちゃう💖 ● データクラスタリング（データのグループ分け）で、効率よくトレーニング！ ● コスト削減だけじゃなく、AIの進化も加速しちゃうかもね♪

詳細解説いくよ～！

背景 LLMのトレーニング（育成）って、お金かかるのよね💸 高性能なGPUが必要だし、クラウド料金もバカにならない😭 そこで、低コストでLLMを育てられる方法を研究したってわけ！ IT業界でも、コスパ良くAI作れる技術が求められてるじゃん？

続きは「らくらく論文」アプリで

MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models

Xin Ye / Daning Cheng / Boyang Zhang / Yunquan Zhang

Training large-scale Mixture-of-Experts (MoE) models typically requires high-memory, high-bandwidth GPUs (e.g., A100), and their high cost has become a major barrier to large-model training. In contrast, affordable hardware is low-cost but constrained by memory capacity and bandwidth, making it unsuitable for direct LLM training. To address this, we propose MoE-DisCo (Mixture-of-Experts with Disentangled Clustering and Coordination), a staged training framework. MoE-DisCo decomposes the MoE model into multiple dense submodels, each consisting of a shared backbone and a single expert, and partitions the training data into subsets using unsupervised clustering. Each submodel is trained independently and in parallel on its assigned data subset using low-cost devices, without any inter-device communication. Subsequently, all experts are integrated into a complete MoE model and fine-tuned globally for a short period on high-memory, high-bandwidth GPUs. Experiments show that our method matches or even surpasses full-parameter training in performance across multiple downstream tasks, loss function, and perplexity (PPL), while reducing training cost by 47.6 percent to 69.5 percent on Qwen1.5-MoE-2.7B and Llama-MoE-3.5B across different datasets.

cs / cs.LG / cs.AI

Arxivで見る