AgentOCRって最強！エージェント履歴を視覚化💖

Published：2026/1/8 10:10:20

AgentOCRって最強！エージェント履歴を視覚化💖

超要約: LLMエージェントの履歴を画像化して、コスト削減＆爆速化！

✨ ギャル的キラキラポイント ✨ ● 履歴を画像にしちゃう発想が天才的💡テキストより情報詰め込めるって最高じゃん？ ● トークン（文字数みたいなもの）消費を抑えて、お財布にも優しい💰 ● 対話がスムーズになるから、まさに神！ユーザー体験爆上がり～💖

詳細解説

背景 LLM（大規模言語モデル）のエージェントって、賢いけど長文には弱い💔会話が長くなると、情報量が多くて処理が大変なのよね。トークン（文字数みたいなもの）が爆増しちゃうから、お金もかかるし、動きも遅くなる… なんとかしたい！ってことで、この研究が登場✨

続きは「らくらく論文」アプリで

AgentOCR: Reimagining Agent History via Optical Self-Compression

Lang Feng / Fuchao Yang / Feng Chen / Xin Cheng / Haiyang Xu / Zhenglin Wan / Ming Yan / Bo An

Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-compression.

cs / cs.LG / cs.AI

Arxivで見る

AgentOCRって最強！ エージェント履歴を視覚化💖

AgentOCR: Reimagining Agent History via Optical Self-Compression

AgentOCRって最強！エージェント履歴を視覚化💖