LLMを賢く軽量化！DenoiseRotatorって何者？

Published：2025/12/17 8:50:04

LLMを賢く軽量化！DenoiseRotatorって何者？✨

超要約：LLM（大規模言語モデル）を小さくしても、賢さはそのまま！「DenoiseRotator」で推論（すいろん）コストも下げちゃうぞ～！💰

ギャル的キラキラポイント✨

● モデルのサイズダウンに成功！スマホとかでも動くようになるかも♪📱 ● 大事なとこはそのままキープ！賢さは落ちないって最強じゃん？😎 ● IT業界（ITぎょうかい）のコスト削減にも貢献（こうけん）！まさにWIN-WIN💖

詳細解説

続きは「らくらく論文」アプリで

DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration

Tianteng Gu / Bei Liu / Bo Xiao / Ke Zeng / Jiacheng Liu / Yanmin Qian

Pruning is a widely used technique to compress large language models (LLMs) by removing unimportant weights, but it often suffers from significant performance degradation - especially under semi-structured sparsity constraints. Existing pruning methods primarily focus on estimating the importance of individual weights, which limits their ability to preserve critical capabilities of the model. In this work, we propose a new perspective: rather than merely selecting which weights to prune, we first redistribute parameter importance to make the model inherently more amenable to pruning. By minimizing the information entropy of normalized importance scores, our approach concentrates importance onto a smaller subset of weights, thereby enhancing pruning robustness. We instantiate this idea through DenoiseRotator, which applies learnable orthogonal transformations to the model's weight matrices. Our method can be seamlessly integrated with existing pruning techniques such as Magnitude, SparseGPT, and Wanda. Evaluated on LLaMA3, Qwen2.5, and Mistral models under 50% unstructured and 2:4 semi-structured sparsity, DenoiseRotator consistently improves perplexity and zero-shot accuracy. For instance, on LLaMA3-70B pruned with SparseGPT at 2:4 semi-structured sparsity, DenoiseRotator reduces the perplexity gap to the dense model by 58%, narrowing the degradation from 8.1 to 3.4 points. Codes are available at https://github.com/Axel-gu/DenoiseRotator.

cs / cs.LG / cs.CL

Arxivで見る