iconLogo
Published:2026/1/5 15:24:05

最強VA爆誕! COREフレームワークがスゴい💖

超要約:VA(バーチャルエージェント)を賢くする新技術! 🤖✨

ギャル的キラキラポイント✨ ● 手動の報酬設定ナシでVAが賢くなるって神!🙏 ● 色んなアプリとかWebサイトで使えるからマジ卍!💻 ● AI開発が楽になって、みんなハッピーになれる未来!🥰

詳細解説 背景 VA(バーチャルエージェント)って、色んな操作を自動でやってくれるスゴイやつ!✨ でも、従来の技術だと、VAを賢くするのに苦労してたの😭 BC(行動クローニング)だと、行動が単調になっちゃうし、RL(強化学習)は報酬設定が難しかったり…🌀

方法 そこで登場したのが「CORE」フレームワーク!💖 デモンストレーション(お手本)から、自動で報酬関数(Label Function)を作っちゃうんだって!😳 Semantic Code Abstractionって技術で、RLみたいな報酬設計の手間を省けるし、BCみたいな安定性もバッチリ👍 あと、Strategy Graph Expansionっていう技術で、色んな行動パターンを学習して、応用力もUP!🚀 Trajectory-Guided Extrapolationで、さらにすごい多様性をGETできる!🌈

続きは「らくらく論文」アプリで

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Keyu Wang / Bingchen Miao / Wendong Bu / Yu Wu / Juncheng Li / Shengyu Zhang / Wenqiao Zhang / Siliang Tang / Jun Xiao / Yueting Zhuang

The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challenges: Behavior Cloning is simple and effective through imitation but suffers from low behavioral diversity, while Reinforcement Learning is capable of discovering novel strategies through exploration but heavily relies on manually designed reward functions. To address the conflict between these two methods, we present CORE, a Code-based Inverse Self-Training Framework with Graph Expansion that bridges imitation and exploration, offering a novel training framework that promotes behavioral diversity while eliminating the reliance on manually reward design. Specifically, we introduce Semantic Code Abstraction to automatically infers reward functions from expert demonstrations without manual design. The inferred reward function, referred to as the Label Function, is executable code that verifies one key step within a task. Building on this, we propose Strategy Graph Expansion to enhance in-domain behavioral diversity, which constructs a multi-path graph called Strategy Graph that captures diverse valid solutions beyond expert demonstrations. Furthermore, we introduce Trajectory-Guided Extrapolation, which enriches out-of-domain behavioral diversity by utilizing both successful and failed trajectories to expand the task space. Experiments on Web and Android platforms demonstrate that CORE significantly improves both overall performance and generalization, highlighting its potential as a robust and generalizable training paradigm for building powerful virtual agents.

cs / cs.LG / cs.CV