ギガ推し！GNNの爆速化技術🚀✨

Published：2025/8/22 16:39:53

ギガ推し！GNNの爆速化技術🚀✨

超要約：GNNをスマホでも爆速で動かす方法、見つけちゃった💖

ギャル的キラキラポイント✨

● スマホとかの小さいデバイス（エッジデバイス）でも、賢いレコメンド（おすすめ）ができるようになるって、神じゃん？🌟

● GNN（グラフニューラルネットワーク）っていう、スゴイAIの計算を、めっちゃ効率的にする技術なの😍

続きは「らくらく論文」アプリで

A Node-Aware Dynamic Quantization Approach for Graph Collaborative Filtering

Lin Li / Chunyang Li / Yu Yin / Xiaohui Tao / Jianwei Zhang

In the realm of collaborative filtering recommendation systems, Graph Neural Networks (GNNs) have demonstrated remarkable performance but face significant challenges in deployment on resource-constrained edge devices due to their high embedding parameter requirements and computational costs. Using common quantization method directly on node embeddings may overlooks their graph based structure, causing error accumulation during message passing and degrading the quality of quantized embeddings.To address this, we propose Graph based Node-Aware Dynamic Quantization training for collaborative filtering (GNAQ), a novel quantization approach that leverages graph structural information to enhance the balance between efficiency and accuracy of GNNs for Top-K recommendation. GNAQ introduces a node-aware dynamic quantization strategy that adapts quantization scales to individual node embeddings by incorporating graph interaction relationships. Specifically, it initializes quantization intervals based on node-wise feature distributions and dynamically refines them through message passing in GNN layers. This approach mitigates information loss caused by fixed quantization scales and captures hierarchical semantic features in user-item interaction graphs. Additionally, GNAQ employs graph relation-aware gradient estimation to replace traditional straight-through estimators, ensuring more accurate gradient propagation during training. Extensive experiments on four real-world datasets demonstrate that GNAQ outperforms state-of-the-art quantization methods, including BiGeaR and N2UQ, by achieving average improvement in 27.8\% Recall@10 and 17.6\% NDCG@10 under 2-bit quantization. In particular, GNAQ is capable of maintaining the performance of full-precision models while reducing their model sizes by 8 to 12 times; in addition, the training time is twice as fast compared to quantization baseline methods.

cs / cs.IR

Arxivで見る