画像理解、ギャルでも分かる！✨

Published：2025/12/25 13:35:52

タイトル & 超要約：画像理解、ギャルでも分かる！✨

研究では、AIが画像の美しさとかを理解できるように、新しい方法（UniPercept-Bench）を開発したよ！💖

✨ ギャル的キラキラポイント ✨

● 既存のAIじゃ分かんなかった、画像の「エモさ」を評価できるようにしたってこと🫶 ● 「美観」「品質」「構造」「テクスチャ」を意識して、細かく評価できるようにしたの！ ● AIが作った画像のクオリティを、もっともっと上げられるようになるってこと💎

詳細解説

続きは「らくらく論文」アプリで

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Shuo Cao / Jiayang Li / Xiaohui Li / Yuandong Pu / Kaiwen Zhu / Yuanting Gao / Siqi Luo / Yi Xin / Qi Qin / Yu Zhou / Xiangyu Chen / Wenlong Zhang / Bin Fu / Yu Qiao / Yihao Liu

Multimodal large language models (MLLMs) have achieved remarkable progress in visual understanding tasks such as visual grounding, segmentation, and captioning. However, their ability to perceive perceptual-level image features remains limited. In this work, we present UniPercept-Bench, a unified framework for perceptual-level image understanding across three key domains: Aesthetics, Quality, Structure and Texture. We establish a hierarchical definition system and construct large-scale datasets to evaluate perceptual-level image understanding. Based on this foundation, we develop a strong baseline UniPercept trained via Domain-Adaptive Pre-Training and Task-Aligned RL, enabling robust generalization across both Visual Rating (VR) and Visual Question Answering (VQA) tasks. UniPercept outperforms existing MLLMs on perceptual-level image understanding and can serve as a plug-and-play reward model for text-to-image generation. This work defines Perceptual-Level Image Understanding in the era of MLLMs and, through the introduction of a comprehensive benchmark together with a strong baseline, provides a solid foundation for advancing perceptual-level multimodal image understanding.

cs / cs.CV

Arxivで見る