iconLogo
Published:2026/1/7 2:57:38

モバイルGUIを最強にするAIフレームワーク爆誕💖(D-Artemis)

超要約: スマホアプリ🤖を賢く動かすAIフレームワーク、D-Artemis登場!学習もエラー修正も超優秀✨

🌟 ギャル的キラキラポイント✨ ● データ少なめOK!データ不足に悩む必要ナシ! ● エラーにすぐ気づく!やり直しもスピーディー! ● 複雑な操作もバッチリ!スマホアプリがもっと便利に😍

詳細解説

背景 スマホアプリって便利だけど、AIで操作するのって難しいじゃん? データ集めるのも大変だし、エラーも見つけにくいし…😩。でも、D-Artemisなら、そんな悩みを解決できるかも!

方法 D-Artemisは、人間の脳みそ🧠をマネしてるの! まず、行動する前に「これでOK?」って確認! 失敗したら反省して、次につなげるんだって!すごい✨

続きは「らくらく論文」アプリで

D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents

Hongze Mi / Yibo Feng / Wenjie Lu / Yuqi Wang / Jinyuan Li / Song Cao / He Cui / Tengfei Tian / Xuelin Zhang / Haotian Luo / Di Sun / Jun Fang / Hua Chai / Naiqiang Tan / Gang Pan

Graphical User Interface (GUI) agents aim to automate a wide spectrum of human tasks by emulating user interaction. Despite rapid advancements, current approaches are hindered by several critical challenges: data bottleneck in end-to-end training, high cost of delayed error detection, and risk of contradictory guidance. Inspired by the human cognitive loop of Thinking, Alignment, and Reflection, we present D-Artemis -- a novel deliberative framework in this paper. D-Artemis leverages a fine-grained, app-specific tip retrieval mechanism to inform its decision-making process. It also employs a proactive Pre-execution Alignment stage, where Thought-Action Consistency (TAC) Check module and Action Correction Agent (ACA) work in concert to mitigate the risk of execution failures. A post-execution Status Reflection Agent (SRA) completes the cognitive loop, enabling strategic learning from experience. Crucially, D-Artemis enhances the capabilities of general-purpose Multimodal large language models (MLLMs) for GUI tasks without the need for training on complex trajectory datasets, demonstrating strong generalization. D-Artemis establishes new state-of-the-art (SOTA) results across both major benchmarks, achieving a 75.8% success rate on AndroidWorld and 96.8% on ScreenSpot-V2. Extensive ablation studies further demonstrate the significant contribution of each component to the framework.

cs / cs.AI