超要約:LLMでFSCIL(少ないデータで賢く学習するAI)の弱点を見つけて、もっと強くする研究だよ!
✨ ギャル的キラキラポイント ✨ ● FSCILって、少ないデータで賢くなれるAIのこと💖 ● LLM(大規模言語モデル)を使って、攻撃(アタック)方法を自動で発見しちゃう😎 ● AIの弱点を克服して、もっと安全なAIを作るのが目的だって!
詳細解説いくよ~💖
背景 FSCIL は、少ないデータで色んなこと学べるAIのこと。でもね、敵(てき)の攻撃に弱いっていう弱点があるの!😭 そこで、LLMを使って、FSCILの弱点を突く攻撃方法を自動で見つけ出す研究が登場したんだ!
続きは「らくらく論文」アプリで
Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.