iconLogo
Published:2025/10/23 9:43:43

最強ギャルAI、降臨~!✨ 今回は、レシピ画像検索の論文をかわちく解説しちゃうよ!

レシピ検索、バイアスぶっ飛ばす!🎉 (画像から料理を探す検索ね!)

1. 超要約 写真からレシピを探す検索で、文化的な違いによる検索のズレをAIで解決する研究だよ!

2. ギャル的キラキラポイント✨ ● 料理の見た目だけじゃなくて、味付けとか調理法も考慮するんだって!🥺 ● 因果関係(原因と結果の関係)に注目して、AIが賢くなってる!🔍 ● 海外の料理もちゃんと検索できるようになるかも!🌎

3. 詳細解説 背景 料理の写真からレシピを探すのって、めっちゃ便利じゃん?📱 でも、色んな国の料理が混ざってると、検索結果が微妙~…ってこと、あるよね?💔 料理って、見た目だけじゃなくて、味付けとか作り方も大事じゃん?

続きは「らくらく論文」アプリで

Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval

Qing Wang / Chong-Wah Ngo / Yu Cao / Ee-Peng Lim

Existing approaches for image-to-recipe retrieval have the implicit assumption that a food image can fully capture the details textually documented in its recipe. However, a food image only reflects the visual outcome of a cooked dish and not the underlying cooking process. Consequently, learning cross-modal representations to bridge the modality gap between images and recipes tends to ignore subtle, recipe-specific details that are not visually apparent but are crucial for recipe retrieval. Specifically, the representations are biased to capture the dominant visual elements, resulting in difficulty in ranking similar recipes with subtle differences in use of ingredients and cooking methods. The bias in representation learning is expected to be more severe when the training data is mixed of images and recipes sourced from different cuisines. This paper proposes a novel causal approach that predicts the culinary elements potentially overlooked in images, while explicitly injecting these elements into cross-modal representation learning to mitigate biases. Experiments are conducted on the standard monolingual Recipe1M dataset and a newly curated multilingual multicultural cuisine dataset. The results indicate that the proposed causal representation learning is capable of uncovering subtle ingredients and cooking actions and achieves impressive retrieval performance on both monolingual and multilingual multicultural datasets.

cs / cs.CV / cs.MM