超要約: LLMの優秀さを見抜く! 低コストで精度も爆上がりする方法だよ💖
🌟 ギャル的キラキラポイント ● LLMを細かくチェック💖 指示通りか、ちゃんと見極めるよ! ● 評価がめっちゃ早い! コスト削減にも貢献しちゃう🎵 ● 他のAIより優秀! IF-CRITIC、マジ卍じゃん?😎
詳細解説 背景 LLM(大規模言語モデル)って、色んな指示(命令)に従ってすごい文章とか作れるけど、ちゃんと指示通りか、評価するのって大変だったの😔 従来の評価方法はお金も時間もかかったり、精度もイマイチだったり…
方法 そこで登場! IF-CRITICは、指示を細かくチェックリストにして、一個一個丁寧に評価するの! ✨ 専門家が作った評価データで、さらに精度UP! フィルタリングもして、もっと信頼性も高めたんだって!
続きは「らくらく論文」アプリで
Instruction-following is a fundamental ability of Large Language Models (LLMs), requiring their generated outputs to follow multiple constraints imposed in input instructions. Numerous studies have attempted to enhance this ability through preference optimization or reinforcement learning based on reward signals from LLM-as-a-Judge. However, existing evaluation models for instruction-following still possess many deficiencies, such as substantial costs and unreliable assessments. To this end, we propose IF-CRITIC, an LLM critic for fine-grained, efficient, and reliable instruction-following evaluation. We first develop a checklist generator to decompose instructions and generate constraint checklists. With the assistance of the checklists, we collect high-quality critique training data through a multi-stage critique filtering mechanism and employ a constraint-level preference optimization method to train IF-CRITIC. Extensive experiments show that the evaluation performance of IF-CRITIC can beat strong LLM-as-a-Judge baselines, including o4-mini and Gemini-3-Pro. With the reward signals provided by IF-CRITIC, LLMs can achieve substantial performance gains in instruction-following optimization under lower computational overhead compared to strong LLM critic baselines.