LLMで脆弱性（ぜいじゃくせい）見つけちゃお！法廷劇（ほうていげき）風AI✨

Published：2025/12/3 22:14:37

LLMで脆弱性（ぜいじゃくせい）見つけちゃお！法廷劇（ほうていげき）風AI✨

超要約: LLMでソフトの弱点を見つける天才チーム、現る！法廷みたいに議論して精度UP！

✨ ギャル的キラキラポイント ✨ ● 裁判みたいにAIが議論！面白い～！ ● 最新のLLM（GPT-40）で精度も爆上がり！ ● 説明もしてくれるから、なんで悪いのかもバッチリ分かる👍

詳細解説いくよ～！

背景ソフトの弱点探しって大変じゃん？でもLLM（頭の良いAI）を使えば楽になるかも！って研究なの。でもLLMだけじゃ難しいから、もっとすごい方法を考えたんだって！

続きは「らくらく論文」アプリで

Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents

Ratnadira Widyasari / Martin Weyssow / Ivana Clairine Irsan / Han Wei Ang / Frank Liauw / Eng Lieh Ouh / Lwin Khin Shar / Hong Jin Kang / David Lo

Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.

cs / cs.SE / cs.AI

Arxivで見る