最新技術を、ギャルが分かりやすく解説するよ!
✨ ギャル的キラキラポイント ✨ ● エージェントの行動を監視して、悪いこと(不正利用とか)を防ぐんだって! ● AIが何してるか「見える化」して、みんなが安心して使えるようにするの✨ ● 知的財産(AIのアイデアとか)を守るのにも役立つみたい!
詳細解説いくよ~!
背景 最近のAIエージェントって、自分で考えて動くスゴイやつらがいるじゃん? でも、悪い人がその技術をパクったり、変なことに使ったりする可能性もあるよね😱 だから、AIの行動をちゃんと監視して、安全に使えるようにしなきゃなんだ!
続きは「らくらく論文」アプリで
LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.