Published：2025/8/22 20:49:15

多モーダル皮肉検出、爆誕！🎉

超要約：画像+テキストから皮肉を暴く、最新AI！😎

ギャル的キラキラポイント✨

● 画像と文章から皮肉を読み解く、ってのがスゴくない？👀✨ ● 「IRONIC」っていう、新しいAIモデルが優秀らしい🎵 ● ソーシャルメディアの炎上対策とかにも使えるかも！🔥

詳細解説

背景 SNSでよく見る「皮肉」をAIが理解するのは難しい課題だったの！テキストだけじゃなくて、画像も一緒に見ないと分からない皮肉ってあるじゃん？😂 今までのAIは、そこが苦手だったんだよね～！

方法 AIに「一貫性」を理解させることが重要！テキストと画像が矛盾してたり、面白い組み合わせだったり…そういう「皮肉あるある」をAIが学習するようにしたんだって✨「IRONIC」ってフレームワークを使って、賢く学習させたみたい！

続きは「らくらく論文」アプリで

IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection

Aashish Anantha Ramakrishnan / Aadarsh Anantha Ramakrishnan / Dongwon Lee

Interpreting figurative language such as sarcasm across multi-modal inputs presents unique challenges, often requiring task-specific fine-tuning and extensive reasoning steps. However, current Chain-of-Thought approaches do not efficiently leverage the same cognitive processes that enable humans to identify sarcasm. We present IRONIC, an in-context learning framework that leverages Multi-modal Coherence Relations to analyze referential, analogical and pragmatic image-text linkages. Our experiments show that IRONIC achieves state-of-the-art performance on zero-shot Multi-modal Sarcasm Detection across different baselines. This demonstrates the need for incorporating linguistic and cognitive insights into the design of multi-modal reasoning strategies. Our code is available at: https://github.com/aashish2000/IRONIC

cs / cs.CL / cs.AI / cs.CV

Arxivで見る