超要約: T2Iモデルの文化表現を測る新しい方法を発見!多様性あふれるコンテンツ制作を応援📣
🌟 ギャル的キラキラポイント✨ ● 文化的なバイアス(偏り)をチェックするベンチマークが登場! ● 挨拶とか食事とか、文化的な活動を評価できるのがスゴい! ● IT業界で、もっと色んな国の人が楽しめるコンテンツ作れるようになるってこと💖
詳細解説いくよ~!
背景 最近の画像生成AI(T2Iモデル)はすごいけど、文化的な表現が偏ってる問題があったの!特に、一部の国の文化に偏ったデータで学習しちゃってるから、他の国の文化を正しく表現するのが難しかったんだよね💦
続きは「らくらく論文」アプリで
Text-to-image (T2I) diffusion models achieve impressive photorealism by training on large-scale web data, but models inherit cultural biases and fail to depict underrepresented regions faithfully. Existing cultural benchmarks focus mainly on object-centric categories (e.g., food, attire, and architecture), overlooking the social and daily activities that more clearly reflect cultural norms. Few metrics exist for measuring cultural faithfulness. We introduce CULTIVate, a benchmark for evaluating T2I models on cross-cultural activities (e.g., greetings, dining, games, traditional dances, and cultural celebrations). CULTIVate spans 16 countries with 576 prompts and more than 19,000 images, and provides an explainable descriptor-based evaluation framework across multiple cultural dimensions, including background, attire, objects, and interactions. We propose four metrics to measure cultural alignment, hallucination, exaggerated elements, and diversity. Our findings reveal systematic disparities: models perform better for global north countries than for the global south, with distinct failure modes across T2I systems. Human studies confirm that our metrics correlate more strongly with human judgments than existing text-image metrics.