最強ギャルAI爆誕！画像キャプション評価、爆上がり！？🚀💕

Published：2025/12/16 14:06:35

最強ギャルAI爆誕！画像キャプション評価、爆上がり！？🚀💕

タイトル & 超要約（15字以内） DISCODE！画像キャプション評価を爆上げ！最強💖
ギャル的キラキラポイント✨ ×3 ● 人間の評価に激似！キャプションの質、爆上がり～！ ● ファインチューニング（細かい調整）しなくてOK！楽ちん🎵 ● 色んなジャンル（ドメイン）の画像にバッチリ対応！🌟
詳細解説（各200字以内） • 背景画像に説明文（キャプション）を自動でつける技術、めっちゃ流行ってるじゃん？でも、AIが作ったキャプションって、人間の評価とズレがあること、よくあるよね？😥 この研究は、そのズレをなくして、もっと良いキャプションを評価できるようにしたんだって！色んなサービスで役立ちそうじゃん？

• 方法 DISCODEっていう新しい方法を使ったみたい！LVLM（優秀なAI）が作ったキャプションの点数を、人間の評価に近づけるように調整するんだって！✨ 具体的には、Adaptive Test-Time (ATT) loss っていうのを使って、ガウス分布（データのばらつきを表す図みたいなの）をうまく活用してるらしい！細かい調整なしで、色んなAIに使えるのがすごい！

続きは「らくらく論文」アプリで

DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning

Nakamasa Inoue / Kanoko Goto / Masanari Oi / Martyna Gruszka / Mahiro Ukai / Takumi Hirose / Yusuke Sekikawa

Large vision-language models (LVLMs) have shown impressive performance across a broad range of multimodal tasks. However, robust image caption evaluation using LVLMs remains challenging, particularly under domain-shift scenarios. To address this issue, we introduce the Distribution-Aware Score Decoder (DISCODE), a novel finetuning-free method that generates robust evaluation scores better aligned with human judgments across diverse domains. The core idea behind DISCODE lies in its test-time adaptive evaluation approach, which introduces the Adaptive Test-Time (ATT) loss, leveraging a Gaussian prior distribution to improve robustness in evaluation score estimation. This loss is efficiently minimized at test time using an analytical solution that we derive. Furthermore, we introduce the Multi-domain Caption Evaluation (MCEval) benchmark, a new image captioning evaluation benchmark covering six distinct domains, designed to assess the robustness of evaluation metrics. In our experiments, we demonstrate that DISCODE achieves state-of-the-art performance as a reference-free evaluation metric across MCEval and four representative existing benchmarks.

cs / cs.CV / cs.AI

Arxivで見る