Published：2025/12/16 6:55:10

Astraea爆誕！LLMエージェントを爆速化🚀

論文の内容をギャルが超～簡単に解説していくよ！😉

超要約：LLMエージェントを爆速にする魔法のスケジューリング✨

ギャル的キラキラポイント✨

● LLMエージェント（AIアシスタントとか）の動きをめっちゃスムーズにする技術！ ● API呼び出し（お外に情報を取りに行くこと）の待ち時間を短縮するよ！ ● ユーザーがAIをサクサク使えるようになるってコト💖

続きは「らくらく論文」アプリで

Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents

Hongqiu Ni / Jiabao Zhang / Guopeng Li / Zilong Wang / Ruiqi Wu / Chi Zhang / Haisheng Tan

Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O and compute intensive nature and uses an enhanced HRRN policy to balance efficiency and fairness. Astraea also implements an adaptive KV cache manager that intelligently handles the agent state during I/O waits based on the system memory pressure. Extensive experiments show that Astraea reduces average JCT by up to 25.5\% compared to baseline methods. Moreover, our approach demonstrates strong robustness and stability under high load across various model scales.

cs / cs.CL

Arxivで見る