OpenRT爆誕！MLLMの安全を守るRed Teaming🎉

Published：2026/1/4 16:41:33

OpenRT爆誕！MLLMの安全を守るRed Teaming🎉

超要約：MLLM（すごいAI）の安全性をチェックするツール「OpenRT」ってのがスゴイ！

✨ ギャル的キラキラポイント ✨ ● 37種類も攻撃方法を試せるって、まるでコスメみたいに色々試せるじゃん？🥺 ● 大規模（おおがた）なMLLMでも、サクサク動くように工夫されてるのが優秀（えらい）！✨ ● オープンソース（誰でも使える）って、太っ腹～！みんなでMLLMを安全にできるね💖

背景

MLLMって、色んな情報処理（がぞうとか、音声とか）できるスゴいAIのこと😳✨ いろんな分野で活躍する期待の星だけど、ちょっと危険な一面も…😱 例えば、悪い人がMLLMに「悪いこと」をさせようとするかも…！それを防ぐために、安全性をしっかりチェックする必要があるんだよね😉

続きは「らくらく論文」アプリで

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Xin Wang / Yunhao Chen / Juncheng Li / Yixu Wang / Yang Yao / Tianle Gu / Jie Li / Yan Teng / Xingjun Ma / Yingchun Wang / Xia Hu

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

cs / cs.CR / cs.CV

Arxivで見る