iconLogo
Published:2026/1/4 22:50:23

最強ギャル、認知共謀攻撃を斬る!LLMの弱点、見つけちゃった💖

  1. タイトル & 超要約 LLM(かしこいAI)も騙される!?認知共謀攻撃(こうちきょうぼうこうげき)って何?🧐

  2. ギャル的キラキラポイント✨ ● LLMの「考えすぎちゃう」弱点、見つけた!🧠 ● 嘘(うそ)を混ぜて、AIを騙す作戦を開発!😈 ● 安全なAIを作るために、研究してるって、エモくない?😭

  3. 詳細解説

    • 背景 最近のAI(LLM)は賢くなってるけど、実は弱点があるみたい🤔それは、人間みたいに、ちょっとした情報から変なこと考えちゃうこと!💦それを利用して、AIを騙す「認知共謀攻撃」ってのが現れたの!

    • 方法 研究者たちは、嘘と本当の情報を組み合わせて、AIに「あれ?これって本当かも!」って思わせる作戦を考えたみたい!🎬まるで映画の編集みたいに、情報を並べ替えて、AIが変な結論を出すように誘導するんだって!

続きは「らくらく論文」アプリで

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Jinwei Hu / Xinmiao Huang / Youcheng Sun / Yi Dong / Xiaowei Huang

As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.

cs / cs.CL / cs.AI / cs.MA