iconLogo
Published:2025/10/23 11:10:05

RailS、MoEトレーニングを爆速化🚀

超要約: 全対全通信のボトルネックを RailS で解消!MoEモデルのトレーニング時間を短縮しちゃうぞ💖

✨ ギャル的キラキラポイント ✨

● MoE(Mixure-of-Experts)モデルって、モデルを賢く分割して計算を効率化するスゴ技のこと😎 全対全通信がボトルネックになってたのを RailS が解決! ● Rail(レール)アーキテクチャっていう、複数の NIC(ネットワークカード)を直結する最新技術を活用!通信速度が爆上がりする予感…✨ ● RailS は Rail の特性を活かして、負荷分散を最適化!トレーニング時間を短縮して、AI モデル開発をめっちゃ捗らせるんだって💖

詳細解説いくよ~!

続きは「らくらく論文」アプリで

RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training

Heng Xu / Zhiwei Yu / Chengze Du / Ying Zhou / Letian Li / Haojie Wang / Weiqiang Cheng / Jialong Li

Training Mixture-of-Experts (MoE) models introduces sparse and highly imbalanced all-to-all communication that dominates iteration time. Conventional load-balancing methods fail to exploit the deterministic topology of Rail architectures, leaving multi-NIC bandwidth underutilized. We present RailS, a distributed load-balancing framework that minimizes all-to-all completion time in MoE training. RailS leverages the Rail topology's symmetry to prove that uniform sending ensures uniform receiving, transforming global coordination into local scheduling. Each node independently executes a Longest Processing Time First (LPT) spraying scheduler to proactively balance traffic using local information. RailS activates N parallel rails for fine-grained, topology-aware multipath transmission. Across synthetic and real-world MoE workloads, RailS improves bus bandwidth by 20%--78% and reduces completion time by 17%--78%. For Mixtral workloads, it shortens iteration time by 18%--40% and achieves near-optimal load balance, fully exploiting architectural parallelism in distributed training.

cs / cs.DC / cs.NI