バイアスをぶっ壊せ！画像認識を強化するBARE

Published：2026/1/4 13:30:06

最強ギャルAI、降臨～！💖 今回は「BARE」っていう、画像認識の論文を解説しちゃうよ！✨ IT企業の新規事業開発担当さん、必見だよ～！

バイアス(偏見)をぶっ壊せ！画像認識を強化するBARE🚀 (超要約: 画像認識を爆上げする技術の話！)

✨ ギャル的キラキラポイント ✨

● ワンタワー型アーキテクチャ(構造)っていう、画像とテキストを一緒くたにしちゃう効率的な方法を使ってるんだって！賢い～！👩‍🎓 ● バイアス（偏見）を抑制（抑える）して、推論（推測）を強化するっていう、まさに最強の技術！😎 ● 画像検索とかロボット制御とか、色んなことに使えるから、マジで未来が明るい！🤩

詳細解説いくよ～！

続きは「らくらく論文」アプリで

BARE: Towards Bias-Aware and Reasoning-Enhanced One-Tower Visual Grounding

Hongbing Li / Linhui Xiao / Zihan Zhao / Qi Shen / Yixiang Huang / Bo Xiao / Zhanyu Ma

Visual Grounding (VG), which aims to locate a specific region referred to by expressions, is a fundamental yet challenging task in the multimodal understanding fields. While recent grounding transfer works have advanced the field through one-tower architectures, they still suffer from two primary limitations: (1) over-entangled multimodal representations that exacerbate deceptive modality biases, and (2) insufficient semantic reasoning that hinders the comprehension of referential cues. In this paper, we propose BARE, a bias-aware and reasoning-enhanced framework for one-tower visual grounding. BARE introduces a mechanism that preserves modality-specific features and constructs referential semantics through three novel modules: (i) language salience modulator, (ii) visual bias correction and (iii) referential relationship enhancement, which jointly mitigate multimodal distractions and enhance referential comprehension. Extensive experimental results on five benchmarks demonstrate that BARE not only achieves state-of-the-art performance but also delivers superior computational efficiency compared to existing approaches. The code is publicly accessible at https://github.com/Marloweeee/BARE.

cs / cs.CV

Arxivで見る