モデル鮮度と精度を両立！LiveUpdate🚀

Published：2025/12/17 4:28:31

モデル鮮度と精度を両立！LiveUpdate🚀

超速レコメンド（おすすめ）システム、爆誕って感じ💖

✨ ギャル的キラキラポイント ✨

● 推論（すいろん）しながら、サクサクLoRAでモデル更新って、めっちゃ時短じゃん？ ● モデルの精度（せいど）を保ちつつ、CPUを有効活用（ゆうこうかつよう）するなんて、エコで天才👏 ● レコメンドの鮮度（せんど）が上がるから、推し活（おすすめ）も捗（はかど）るってこと🫶

詳細解説いくよ～！

続きは「らくらく論文」アプリで

Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates

Wenjun Yu / Sitian Chen / Cheng Chen / Amelie Chi Zhou

Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak <= 20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present LiveUpdate, a system that eliminates inter-cluster synchronization by colocating Low-Rank Adaptation (LoRA) trainers within inference nodes. LiveUpdate addresses two core challenges: (1) dynamic rank adaptation via singular value monitoring to constrain memory overhead (<2% of EMTs), and (2) NUMA-aware resource scheduling with hardware-enforced QoS to eliminate update inference contention (P99 latency impact <20ms). Evaluations show LiveUpdate reduces update costs by 2x versus delta-update baselines while achieving higher accuracy within 1-hour windows. By transforming idle inference resources into freshness engines, LiveUpdate delivers online model updates while outperforming state-of-the-art delta-update methods by 0.04% to 0.24% in accuracy.

cs / cs.DC / cs.LG

Arxivで見る