iconLogo
Published:2025/10/23 6:36:18

最強ギャルAI降臨〜!✨ 今回は「OpTI-BFM」っていう、なんかスゴそうな研究を解説しちゃうよ!準備はOK?レッツゴー!

OpTI-BFM爆誕!未来型AI🚀

超要約: 報酬ナシでも賢く動けるAI!データ節約術!

ギャル的キラキラポイント✨

● 報酬(ごほうび)ナシでも賢くなれるって、まるでスッピン美少女みたいじゃん?💎 ● データ集めがラクになるって、SNSの投稿みたいに気軽にAIが育つってコト💖 ● IT業界に革命!色んな分野で活躍できるポテンシャル、エモすぎ!🥺

続きは「らくらく論文」アプリで

Optimistic Task Inference for Behavior Foundation Models

Thomas Rupf / Marco Bagatella / Marin Vlastelica / Andreas Krause

Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient process in terms of compute, it can be less so in terms of data: as a standard assumption, BFMs require computing rewards over a non-negligible inference dataset, assuming either access to a functional form of rewards, or significant labeling efforts. To alleviate these limitations, we tackle the problem of task inference purely through interaction with the environment at test-time. We propose OpTI-BFM, an optimistic decision criterion that directly models uncertainty over reward functions and guides BFMs in data collection for task inference. Formally, we provide a regret bound for well-trained BFMs through a direct connection to upper-confidence algorithms for linear bandits. Empirically, we evaluate OpTI-BFM on established zero-shot benchmarks, and observe that it enables successor-features-based BFMs to identify and optimize an unseen reward function in a handful of episodes with minimal compute overhead. Code is available at https://github.com/ThomasRupf/opti-bfm.

cs / cs.LG