iconLogo
Published:2025/11/8 4:34:52

多言語LLMのディスコース理解度を測っちゃお!✨

超要約: 多言語LLMの頭の良さを測るテスト「DiscoTrack」爆誕!🌍✨

ギャル的キラキラポイント✨

● 12ヶ国語対応!まるで国際派ギャルみたい💖 ● 文章全体の意味を理解できるか試す、新感覚テスト!🧐 ● AIちゃんの進化を加速させる、未来が楽しみな研究!🚀

詳細解説

続きは「らくらく論文」アプリで

DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Lanni Bu / Lauren Levine / Amir Zeldes

Recent LLM benchmarks have tested models on a range of phenomena, but are still focused primarily on natural language understanding for extraction of explicit information, such as QA or summarization, with responses often targeting information from individual sentences. We are still lacking more challenging, and importantly also multilingual, benchmarks focusing on implicit information and pragmatic inferences across larger documents in the context of discourse tracking: integrating and aggregating information across sentences, paragraphs and multiple speaker utterances. To this end, we present DiscoTrack, an LLM benchmark targeting a range of tasks across 12 languages and four levels of discourse understanding: salience recognition, entity tracking, discourse relations and bridging inference. Our evaluation shows that these tasks remain challenging, even for state-of-the-art models.

cs / cs.CL