タイトル & 超要約:GRASPでLLM(大規模言語モデル)を賢く軽く✨
● パラメータ(モデルの部品みたいなもの)を賢く共有して、学習を爆速💨 ● ハードウェア(機械)のノイズ(雑音)にも負けないロバスト性💪 ● エッジAI(身近なAI)でもLLMが使えるようになるかも💕
LLMって、すごーく賢いけど、学習に時間もお金もかかるんだよね🥺 パラメータが多すぎて、エッジAIみたいな小さい機械じゃ動かせない問題も💦
GRASPは、Transformer(LLMの基本構造)の隠れ層をグループ分けして、パラメータを共有するんだって!LoRA(ローラの略。別の学習手法)みたいに、全部のパラメータを更新しないから、効率的✨ StochGRASPは、ノイズに強いように工夫されてるみたい😎
少ないパラメータで、LoRA並みの性能が出せるらしい!すごい😳 しかも、ハードウェアのノイズにも強くなるから、色んな環境でLLMが使えるようになるってこと🫶
続きは「らくらく論文」アプリで
Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.