1. ギャル的キラキラポイント✨ ● インド文化に特化した評価データセット「DIWALI」を作っちゃった!🇮🇳 ● インドの地域ごとのLLMの得意・不得意も丸わかり👀 ● IT業界で大活躍間違いなしの、アツい研究なの🔥
2. 詳細解説
3. リアルでの使いみちアイデア💡 ● インド旅行アプリで、各地域の文化に合わせた情報を提供!📱 ● インドの友達とチャットする時に、その人に合った表現をしてくれるAIチャットボット!💬
4. もっと深掘りしたい子へ🔍 ● LLM (大規模言語モデル) ● インド文化 ● ローカライズ
続きは「らくらく論文」アプリで
Large language models (LLMs) are widely used in various tasks and applications. However, despite their wide capabilities, they are shown to lack cultural alignment \citep{ryan-etal-2024-unintended, alkhamissi-etal-2024-investigating} and produce biased generations \cite{naous-etal-2024-beer} due to a lack of cultural knowledge and competence. Evaluation of LLMs for cultural awareness and alignment is particularly challenging due to the lack of proper evaluation metrics and unavailability of culturally grounded datasets representing the vast complexity of cultures at the regional and sub-regional levels. Existing datasets for culture specific items (CSIs) focus primarily on concepts at the regional level and may contain false positives. To address this issue, we introduce a novel CSI dataset for Indian culture, belonging to 17 cultural facets. The dataset comprises ~8k cultural concepts from 36 sub-regions. To measure the cultural competence of LLMs on a cultural text adaptation task, we evaluate the adaptations using the CSIs created, LLM as Judge, and human evaluations from diverse socio-demographic region. Furthermore, we perform quantitative analysis demonstrating selective sub-regional coverage and surface-level adaptations across all considered LLMs. Our dataset is available here: https://huggingface.co/datasets/nlip/DIWALI, project webpage https://nlip-lab.github.io/nlip/publications/diwali/, and our codebase with model outputs can be found here: https://github.com/pramitsahoo/culture-evaluation