GPUフリーでLLM爆速化！REVATIって何者？🚀

Published：2026/1/1 17:19:58

GPUフリーでLLM爆速化！REVATIって何者？🚀

超要約: GPUなしでLLMの速さ測れる魔法🪄、REVATI登場！

✨ ギャル的キラキラポイント ✨

● GPUなしでLLMの速度測れるって、マジ神じゃん！✨ コストも時間も節約できるって最高！ ● いろんな設定（パラメタとか）試しまくれるから、最強LLM見つけられそう💎 ● 新しいビジネスチャンスが生まれる予感！🤩 クラウドとかAIプラットフォームがもっと面白くなるかも！

詳細解説いくよ～！

続きは「らくらく論文」アプリで

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

Amey Agrawal / Mayank Yadav / Sukrit Kumar / Anirudha Agrawal / Garv Ghai / Souradeep Bera / Elton Pinto / Sirish Gambhira / Mohammad Adain / Kasra Sohrab / Chus Antonanzas / Alexey Tumanov

Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.

cs / cs.DC / cs.LG

Arxivで見る