超要約 AI モデルのテストでの"あるある"問題、ドメインシフトを、クラス情報と OT で解決!テストでも最強モデル目指すよ!
ギャル的キラキラポイント✨ ● ドメインシフト(データのクセの違い)に負けないモデルを作る!💪 ● 擬似ラベルの"あるある" 失敗を OT で回避!賢すぎ!🧠 ● ゼロショ(事前学習なし)能力をキープしつつ、テストでさらに強く!✨
詳細解説
リアルでの使いみちアイデア💡
続きは「らくらく論文」アプリで
Large pre-trained vision-language models (VLMs), such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we investigate how to efficiently utilize class text information to mitigate distribution drifts encountered by VLMs during inference. In particular, we propose generating pseudo-labels for the noisy test-time samples by aligning visual embeddings with reliable, text-based semantic anchors. Specifically, to maintain the regular structure of the dataset properly, we formulate the problem as a batch-wise label assignment, which is efficiently solved using Optimal Transport. Our method, Semantic Anchor Transport (SAT), utilizes such pseudo-labels as supervisory signals for test-time adaptation, yielding a principled cross-modal alignment solution. Moreover, SAT further leverages heterogeneous textual clues, with a multi-template distillation approach that replicates multi-view contrastive learning strategies in unsupervised representation learning without incurring additional computational complexity. Extensive experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of SAT, achieving consistent performance gains over recent state-of-the-art methods, yet being computationally efficient.