Ariadne爆誕！VLMの推論能力UPで未来がアガる💖

Published：2025/11/7 21:47:55

Ariadne爆誕！VLMの推論能力UPで未来がアガる💖

超要約: VLM（画像とテキストを理解するAI）の頭脳🧠をRL（強化学習）で強化！空間認識力爆上がりだよ☆

🌟 ギャル的キラキラポイント✨ ● VLMが迷路攻略でレベルアップ！まるでゲームみたい🎮 ● RLでVLMを訓練！難しい問題を解けるようにするんだね♪ ● 現実世界でも使える！ナビとかロボットに役立つって最強✨

詳細解説いくよ～！

背景 VLMって、画像と文章を理解するAIのこと💖 でも、空間的な推論（場所の理解とか）はちょっぴり苦手だったの！そこで、VLMをもっと賢くするために、ゲームみたいに試行錯誤して学習する「強化学習（RL）」を使ってみようってことになったんだって！

続きは「らくらく論文」アプリで

Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

Minghe Shen / Zhuo Zhi / Chonghan Liu / Shuo Xing / Zhengzhong Tu / Che Liu

While Vision-Language Models (VLMs) post-trained with Reinforcement Learning (RL) show impressive general reasoning, their evaluation is often confined to language-dominant tasks (e.g., math). This raises a critical question: can RL post-training truly extend the inherent capability boundary of a base VLM, particularly for visual-centric spatial tasks where it initially fails? To investigate this, we introduce Ariadne, a framework utilizing synthetic mazes for multi-step spatial reasoning where task difficulty (e.g., path length, turns) is precisely controlled. We leverage this controllable environment to train VLMs using Reinforcement Learning with Verified Rewards (RLVR) in a difficulty-aware curriculum. Surprisingly, post-RLVR training, the VLM achieves over 50% accuracy on a problem set where the base model scored 0%, demonstrating that our approach expands the model's initial capability boundary. To assess real-world viability, we evaluate out-of-distribution (OOD) generalization on practical benchmarks. Despite training only on synthetic maze samples, Ariadne achieves significant zero-shot improvements, averaging 16% on MapBench (e.g., museum navigation) and 24% on ReasonMap (subway transfer tasks). These results confirm that our method not only broadens the model's fundamental limits but also enhances its generalization to real-world spatial reasoning. We acknowledge our study is limited to the post-training phase, given the opaqueness of pre-training data, and hope our research motivates further work on specialized, capability-extending alignment.

cs / cs.AI

Arxivで見る