タイトル & 超要約 LLM査読の弱点暴露!論文の点数イジれる裏ワザ見つけたよ! IT企業向け✨
ギャル的キラキラポイント✨ ● LLM(AI)が論文を査読(チェック)する時代キター! ● 論文の内容変えずに、点数だけ上げ下げできる攻撃手法を発見! ● IT企業、コレ知ってないとヤバいよ!セキュリティ対策必須!
詳細解説
リアルでの使いみちアイデア💡
続きは「らくらく論文」アプリで
The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential vulnerabilities. Prior attacks rely on prompt injection, which alters manuscript content and conflates injection susceptibility with evaluation robustness. We propose the Paraphrasing Adversarial Attack (PAA), a black-box optimization method that searches for paraphrased sequences yielding higher review scores while preserving semantic equivalence and linguistic naturalness. PAA leverages in-context learning, using previous paraphrases and their scores to guide candidate generation. Experiments across five ML and NLP conferences with three LLM reviewers and five attacking models show that PAA consistently increases review scores without changing the paper's claims. Human evaluation confirms that generated paraphrases maintain meaning and naturalness. We also find that attacked papers exhibit increased perplexity in reviews, offering a potential detection signal, and that paraphrasing submissions can partially mitigate attacks.