超要約:LLMでWeb脆弱性診断を自動化、精度UP!
✨ ギャル的キラキラポイント ✨ ● エキスパートエージェントをテスト中に動的に生成する斬新さ💡 ● CVE-Benchで既存手法の3倍の成功率って、マジ卍じゃん? ● セキュリティ診断のコスト削減、精度UP、スケーラビリティ向上に貢献💖
詳細解説 ● 背景 Webアプリのセキュリティって、日々進化する攻撃に対応するために、常に脆弱性(ぜいじゃくせい:弱点のこと)をチェックしなきゃダメ🙅♀️ でも、従来の診断は専門家の人力に頼ってて、時間もお金もかかってたのよね💸 LLM(大規模言語モデル)を使った自動化も試みられてたけど、既存の方法じゃ限界があったんだって🤔
● 方法 PenForgeは、テスト中にアプリのコンテキスト(状況)情報に基づいて、その場でエキスパートエージェントを生成するシステムなの! Meta-Plannerっていう、全体をまとめるAIがいて、攻撃の戦略を立てたり、エージェントを作ったりするらしい😳 CVE-Benchっていうテストで、他のやり方より3倍も成功したんだって! す、凄すぎる…!
続きは「らくらく論文」アプリで
Penetration testing is essential for identifying vulnerabilities in web applications before real adversaries can exploit them. Recent work has explored automating this process with Large Language Model (LLM)-powered agents, but existing approaches either rely on a single generic agent that struggles in complex scenarios or narrowly specialized agents that cannot adapt to diverse vulnerability types. We therefore introduce PenForge, a framework that dynamically constructs expert agents during testing rather than relying on those prepared beforehand. By integrating automated reconnaissance of potential attack surfaces with agents instantiated on the fly for context-aware exploitation, PenForge achieves a 30.0% exploit success rate (12/40) on CVE-Bench in the particularly challenging zero-day setting, which is a 3 times improvement over the state-of-the-art. Our analysis also identifies three opportunities for future work: (1) supplying richer tool-usage knowledge to improve exploitation effectiveness; (2) extending benchmarks to include more vulnerabilities and attack types; and (3) fostering developer trust by incorporating explainable mechanisms and human review. As an emerging result with substantial potential impact, PenForge embodies the early-stage yet paradigm-shifting idea of on-the-fly agent construction, marking its promise as a step toward scalable and effective LLM-driven penetration testing.