スパースNNでLLM爆速化！脳みそみたいに賢い技術✨

Published：2026/1/2 18:15:12

スパースNNでLLM爆速化！脳みそみたいに賢い技術✨

超要約: LLMを脳みたいに軽くして、賢くする技術の話だよ！
ギャル的キラキラポイント:
- ● 脳みたいに省エネ🧠！TransformerとかLLMを賢くするんだって！
- ● 計算コスト（お金💰）が下がるから、色んな人がAI使えるようになる！
- ● リアルタイムAIとか、スマホ📱でも賢いAIが使えるようになるかも！
詳細解説:
- 背景: 最近のAIはすごいけど、計算が大変だったりするじゃん？🤯
- 方法: 脳みたいに「スパース」な構造（全部繋がってない）のニューラルネットワークを作ったんだって！動的に繋がりを変える「動的スパース学習（DST）」とかいうスゴ技✨
- 結果: 計算が軽くなって、LLMの性能もフルコネクトと変わらないレベルになるって！😳
- 意義: IT業界がめっちゃ潤う！AIがもっと身近になるね♪
リアルでの使いみちアイデア:
- 💡スマホのAIアシスタントが賢くなって、サクサク動くようになる！
- 💡動画編集とか、クリエイティブな作業がAIでめっちゃ楽になる！
もっと深掘りしたい子へ:
- 🔍 Transformer
- 🔍 大規模言語モデル（LLM）
- 🔍 スパースニューラルネットワーク

続きは「らくらく論文」アプリで

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Yingtao Zhang / Diego Cerretti / Jialin Zhao / Wenjing Wu / Ziheng Liao / Umberto Michieli / Carlo Vittorio Cannistraci

Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties in keeping peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (less than 1% connectivity) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is $O(Nd^3)$ - N node network size, d node degree - restricting it to ultra-sparse regimes. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. Here, we design the first brain-inspired network model - termed bipartite receptive field (BRF) - to initialize the connectivity of sparse artificial neural networks. We further introduce a GPU-friendly matrix-based approximation of CH link prediction, reducing complexity to $O(N^3)$. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on image classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, at 30% connectivity, both CHTs and CHTss outperform other DST methods in language modeling task.

cs / cs.LG

Arxivで見る