Published：2025/12/3 18:49:54

最強AI爆誕！曖昧質問も余裕☆

AIが賢くなって、曖昧（あいまい）な質問にもパーフェクト対応！ビジネスチャンス到来だよ～😎✨

💎 ギャル的キラキラポイント✨ ● AIが質問の仕方とか、回答の出し方を自分で学習するって、めっちゃ賢くない？🤩 ● 状況（画面の大きさとか）に合わせて、最適な回答をしてくれるのが神ってる！🙏 ● ビジネスで使える場面が多すぎて、未来が楽しみすぎる～🥰

詳細解説

背景 AIアシスタントって、ユーザーの質問が曖昧だと困っちゃうじゃん？😢 例えば、「あの映画いつ公開？」みたいな。この研究は、AIが賢く対応できるように、戦略（質問、複数の回答提示とか）を学習させる方法を見つけたんだって！

続きは「らくらく論文」アプリで

Learning Steerable Clarification Policies with Collaborative Self-play

Jonathan Berant / Maximillian Chen / Adam Fisch / Reza Aghajani / Fantine Huot / Mirella Lapata / Jacob Eisenstein

To handle underspecified or ambiguous queries, AI assistants need a policy for managing their uncertainty to determine (a) when to guess the user intent and answer directly, (b) when to enumerate and answer multiple possible intents, and (c) when to ask a clarifying question. However, such policies are contextually dependent on factors such as user preferences or modality. For example, enumerating multiple possible user intentions is cumbersome on small screens or in a voice setting. In this work, we propose to train steerable policies for managing this uncertainty using self-play. Given two agents, one simulating a user and the other an AI assistant, we generate conversations where the user issues a potentially ambiguous query, and the assistant needs to determine how to respond. Importantly, the model takes as input the numerical cost of each clarification question, and each generated word, and is asked to take the action that will maximize its final reward, which is the cost-penalized accuracy. We use Reinforced Self-Training (ReST) to train our model to achieve high reward and show this leads to a steerable policy that changes its behavior predictably conditioned on the provided costs, leading to higher reward and accuracy. Moreover, our procedure also generalizes to numerical cost values that were unobserved at training time.

cs / cs.LG

Arxivで見る