MCTS-EP: オンライン選好最適化による具現化計画の強化

Published：2025/12/16 6:22:13

最強AI爆誕！没入型エージェント、MCTS-EPで超進化✨

超要約: AIロボが賢くなる方法！MCTS-EPで、色んな環境に対応できる最強AI目指すよ！

🌟 ギャル的キラキラポイント✨ ● AIが自分で考えて成長！オンライン学習って最強じゃん？😎 ● 色んな情報（テキストとか画像）を駆使して、人間みたいに動けるようになるってマジ⁉️ ● AIロボが、色んなお仕事できるようになる未来、ちょー楽しみだね🎵

🌟 詳細解説 ● 背景最近のAI（人工知能）は、すごい勢いで進化してるよね！特に、LLM（大規模言語モデル）とかVLM（Vision-Language Model）ってやつがヤバくて、色んな情報を理解して、色んなことできるようになってるの！😳 でも、AIロボが複雑な環境で動くには、もっと賢くならなきゃ！そこで登場したのが、MCTS-EP！

● 方法 MCTS-EPは、MCTS（モンテカルロ木探索）っていう方法を使って、AIロボに色んな経験をさせて、そこから学ばせるんだって！🤔 さらに、選好最適化（えらびなおし）ってのも使ってて、AIが「どっちの行動が良いか」を自分で判断できるようになるらしい！すごーい！

続きは「らくらく論文」アプリで

MCTS-EP: Empowering Embodied Planning with Online Preference Optimization

Hang Xu / Zang Yu / Yehui Tang / Pengbo Hu / Yuhao Tang / Hao Dong

This paper introduces MCTS-EP, an online learning framework that combines large language models (LLM) with Monte Carlo Tree Search (MCTS) for training embodied agents. MCTS-EP integrates three key components: MCTS-guided exploration for preference data collection, efficient multi-modal reasoning mechanism, and iterative training pipeline based on preference optimization. We theoretically prove that MCTS-EP achieves better performance bounds than conventional on-policy algorithms when the loss function is strongly convex, and demonstrate that it can be formulated as a search-enhanced variant of GAIL. MCTS-EP achieves state-of-the-art performace across serval benchmarks. In ALFWorld, it achieves 92% and 87% success rates for textual and visual tasks. In WebShop, it reaches an average reward of 0.81. MTCS-EP also reduces average interaction steps from from 18.7/19.5 to 10.2/9.9 steps in visual ALFWorld.Code available at: https://github.com/xuhang-2/Embodied-Agent-Planning

cs / cs.AI

Arxivで見る