超要約: LLMでソフトの弱点を見つける天才チーム、現る!法廷みたいに議論して精度UP!
✨ ギャル的キラキラポイント ✨ ● 裁判みたいにAIが議論!面白い~! ● 最新のLLM(GPT-40)で精度も爆上がり! ● 説明もしてくれるから、なんで悪いのかもバッチリ分かる👍
詳細解説いくよ~!
背景 ソフトの弱点探しって大変じゃん? でもLLM(頭の良いAI)を使えば楽になるかも!って研究なの。でもLLMだけじゃ難しいから、もっとすごい方法を考えたんだって!
続きは「らくらく論文」アプリで
Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.