インド版LLM評価！DIWALI爆誕🎉

Published：2025/12/16 7:09:46

インド版LLM評価！DIWALI爆誕🎉（超要約：インド文化LLMのすごさを徹底解剖！）

1. ギャル的キラキラポイント✨ ● インド文化に特化した評価データセット「DIWALI」を作っちゃった！🇮🇳 ● インドの地域ごとのLLMの得意・不得意も丸わかり👀 ● IT業界で大活躍間違いなしの、アツい研究なの🔥

2. 詳細解説

背景：LLM（大規模言語モデル）はすごいけど、インド文化にはまだちょっと弱い…😭 そこで、インドの文化をしっかり理解できるLLMを作るために、研究がスタートしたんだって！
方法：インド文化をめっちゃ詳しくまとめたデータセット「DIWALI」を開発！✨ 衣類、料理、お祭り… いろんな情報を集めたんだって！それを元に、LLMの出来を評価するよ！
結果：DIWALIでLLMを評価したら、地域によって得意なこと、苦手なことが見えてきたみたい！👀 インドの多様性を考えると、当然の結果かもね！
意義（ここがヤバい♡ポイント）：インド版のLLMが進化すれば、インド市場向けサービスがもっともっと良くなるってこと！🎉 現地の人に寄り添った、最高のサービスが実現できるかも😍

3. リアルでの使いみちアイデア💡 ● インド旅行アプリで、各地域の文化に合わせた情報を提供！📱 ● インドの友達とチャットする時に、その人に合った表現をしてくれるAIチャットボット！💬

4. もっと深掘りしたい子へ🔍 ● LLM (大規模言語モデル) ● インド文化 ● ローカライズ

続きは「らくらく論文」アプリで

DIWALI: Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context

Pramit Sahoo / Maharaj Brahma / Maunendra Sankar Desarkar

Large language models (LLMs) are widely used in various tasks and applications. However, despite their wide capabilities, they are shown to lack cultural alignment \citep{ryan-etal-2024-unintended, alkhamissi-etal-2024-investigating} and produce biased generations \cite{naous-etal-2024-beer} due to a lack of cultural knowledge and competence. Evaluation of LLMs for cultural awareness and alignment is particularly challenging due to the lack of proper evaluation metrics and unavailability of culturally grounded datasets representing the vast complexity of cultures at the regional and sub-regional levels. Existing datasets for culture specific items (CSIs) focus primarily on concepts at the regional level and may contain false positives. To address this issue, we introduce a novel CSI dataset for Indian culture, belonging to 17 cultural facets. The dataset comprises ~8k cultural concepts from 36 sub-regions. To measure the cultural competence of LLMs on a cultural text adaptation task, we evaluate the adaptations using the CSIs created, LLM as Judge, and human evaluations from diverse socio-demographic region. Furthermore, we perform quantitative analysis demonstrating selective sub-regional coverage and surface-level adaptations across all considered LLMs. Our dataset is available here: https://huggingface.co/datasets/nlip/DIWALI, project webpage https://nlip-lab.github.io/nlip/publications/diwali/, and our codebase with model outputs can be found here: https://github.com/pramitsahoo/culture-evaluation

cs / cs.CL

Arxivで見る