FPGAでLLM爆速化🚀

Published：2026/1/5 14:07:11

FPGAでLLM爆速化🚀

超要約: RWKVをFPGAで爆速化！GPUより高性能＆省エネだよ☆

✨ ギャル的キラキラポイント ✨

● Transformerに対抗（たいこう）できるRWKVをFPGAで動かすって、すごくない！？🤩 ● A-PoT量子化（りょうしか）とかいう、新しい計算方法でさらに速くなるらしい！😎 ● AIアシスタントとか、エッジAI（スマホとか）で活躍（かつやく）する未来が来るかも～🥰

詳細解説いくね～！

続きは「らくらく論文」アプリで

HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV

Liu Shijie / Zeng Zhenghao / Jiao Han / Huang Yihua

RWKV is a modern RNN architecture that approaches the performance of Transformers, with the advantage of processing long contexts at a linear memory cost. However, its sequential computation pattern struggles to efficiently leverage GPU parallelism, which leads to low compute resource utilization. Furthermore, frequent off-chip weight accesses create a memory bottleneck. To address these challenges, we propose HFRWKV, an FPGA-based hardware accelerator specifically designed for RWKV. Within the matrix operation module, we propose a novel hardware-friendly hybrid-precision quantization strategy, which enhances performance while maintaining acceptable accuracy. For the complex operations including exponentiation and division, we introduce a method featuring reusable architectures combined with lookup tables or piecewise linear approximation, which is algorithmically refined to effectively balance precision and hardware resource consumption. Based on this foundation, we adopt a fully on-chip computing system integrating parallel matrix-vector processing array and an efficient pipeline architecture. Through computation reordering and chunked double buffering, it effectively eliminates data transfer bottlenecks and improves overall throughput. We implement HFRWKV on the Alveo U50 and U280 platform. Experimental results show that compared to a CPU, a throughput improvement of 63.48$\times$ and an energy efficiency improvement of 139.17$\times$. Compared to GPUs, achieves a throughput improvement of 32.33$\times$ and an energy efficiency improvement of 171.36$\times$.

cs / cs.AR

Arxivで見る