iconLogo
Published:2026/1/2 18:15:12

スパースNNでLLM爆速化!脳みそみたいに賢い技術✨

  1. 超要約: LLMを脳みたいに軽くして、賢くする技術の話だよ!
  2. ギャル的キラキラポイント:
    • ● 脳みたいに省エネ🧠!TransformerとかLLMを賢くするんだって!
    • ● 計算コスト(お金💰)が下がるから、色んな人がAI使えるようになる!
    • ● リアルタイムAIとか、スマホ📱でも賢いAIが使えるようになるかも!
  3. 詳細解説:
    • 背景: 最近のAIはすごいけど、計算が大変だったりするじゃん?🤯
    • 方法: 脳みたいに「スパース」な構造(全部繋がってない)のニューラルネットワークを作ったんだって!動的に繋がりを変える「動的スパース学習(DST)」とかいうスゴ技✨
    • 結果: 計算が軽くなって、LLMの性能もフルコネクトと変わらないレベルになるって!😳
    • 意義: IT業界がめっちゃ潤う!AIがもっと身近になるね♪
  4. リアルでの使いみちアイデア:
    • 💡スマホのAIアシスタントが賢くなって、サクサク動くようになる!
    • 💡動画編集とか、クリエイティブな作業がAIでめっちゃ楽になる!
  5. もっと深掘りしたい子へ:
    • 🔍 Transformer
    • 🔍 大規模言語モデル(LLM)
    • 🔍 スパースニューラルネットワーク

続きは「らくらく論文」アプリで

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Yingtao Zhang / Diego Cerretti / Jialin Zhao / Wenjing Wu / Ziheng Liao / Umberto Michieli / Carlo Vittorio Cannistraci

Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties in keeping peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (less than 1% connectivity) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is $O(Nd^3)$ - N node network size, d node degree - restricting it to ultra-sparse regimes. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. Here, we design the first brain-inspired network model - termed bipartite receptive field (BRF) - to initialize the connectivity of sparse artificial neural networks. We further introduce a GPU-friendly matrix-based approximation of CH link prediction, reducing complexity to $O(N^3)$. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on image classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, at 30% connectivity, both CHTs and CHTss outperform other DST methods in language modeling task.

cs / cs.LG