エージェントの秘密を守れ！ AgentMarkで行動を追跡しちゃお💖

Published：2026/1/5 15:42:18

エージェントの秘密を守れ！ AgentMarkで行動を追跡しちゃお💖

最新技術を、ギャルが分かりやすく解説するよ！

✨ ギャル的キラキラポイント ✨ ● エージェントの行動を監視して、悪いこと（不正利用とか）を防ぐんだって！ ● AIが何してるか「見える化」して、みんなが安心して使えるようにするの✨ ● 知的財産（AIのアイデアとか）を守るのにも役立つみたい！

詳細解説いくよ～！

背景最近のAIエージェントって、自分で考えて動くスゴイやつらがいるじゃん？でも、悪い人がその技術をパクったり、変なことに使ったりする可能性もあるよね😱 だから、AIの行動をちゃんと監視して、安全に使えるようにしなきゃなんだ！

続きは「らくらく論文」アプリで

AgentMark: Utility-Preserving Behavioral Watermarking for Agents

Kaibo Huang / Jin Tan / Yukun Wei / Wanling Li / Zipei Zhang / Hui Tian / Zhongliang Yang / Linna Zhou

LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.

cs / cs.CR / cs.AI

Arxivで見る