超要約:ロボットの動きを爆速にするAI技術NinA!推論が速くて、色んなことに使えるんだって💕
✨ギャル的キラキラポイント✨ ● Diffusion Model(拡散モデル)より爆速🚀 推論がめっちゃ早いってこと! ● 単一パスで行動生成!めっちゃシンプルで使いやすいってこと💖 ● 製造業とか色んな場所で活躍できる未来がアツい🔥
詳細解説 ● 背景 ロボットを動かすAI(人工知能)の世界では、Diffusion Modelっていうモデルが主流だったんだけど、動きを生成するのに時間がかかってたんだよね😭 それじゃ、リアルタイムで動くロボットには向かないじゃん?
● 方法 そこで、NinA(Normalizing Flows in Action)っていう新しい方法が登場! Normalizing Flow(NF:正規化フロー)っていう技術を使って、ロボットの動きをめっちゃ早く生成できるようにしたんだって! しかも、計算コストも抑えられるから、色んな場所で使えるようになるの🙌
続きは「らくらく論文」アプリで
Recent advances in Vision-Language-Action (VLA) models have established a two-component architecture, where a pre-trained Vision-Language Model (VLM) encodes visual observations and task descriptions, and an action decoder maps these representations to continuous actions. Diffusion models have been widely adopted as action decoders due to their ability to model complex, multimodal action distributions. However, they require multiple iterative denoising steps at inference time or downstream techniques to speed up sampling, limiting their practicality in real-world settings where high-frequency control is crucial. In this work, we present NinA (Normalizing Flows in Action), a fast and expressive alter- native to diffusion-based decoders for VLAs. NinA replaces the diffusion action decoder with a Normalizing Flow (NF) that enables one-shot sampling through an invertible transformation, significantly reducing inference time. We integrate NinA into the FLOWER VLA architecture and fine-tune on the LIBERO benchmark. Our experiments show that NinA matches the performance of its diffusion-based counterpart under the same training regime, while achieving substantially faster inference. These results suggest that NinA offers a promising path toward efficient, high-frequency VLA control without compromising performance.