iconLogo
Published:2025/8/23 1:51:45

ケベック・フランス語の文法チェックAI爆誕!QFrCoLAって何者?✨

超要約: LLM(大規模言語モデル)のフランス語力を試す、すごいデータセット「QFrCoLA」が登場!IT業界に革命を起こすかも?🚀

✨ギャル的キラキラポイント✨

● ケベック・フランス語(カナダのフランス語)に特化!他のフランス語とはちょっと違う、方言とかも考慮されてるってことね😉 ● LLMのフランス語力をガチで評価!文法的に正しいか、間違ってるか、AIちゃんが判断するの😳 ● IT業界大歓喜!翻訳、チャットボット…フランス語サービスがもっと進化するチャンス到来💕

詳細解説

続きは「らくらく論文」アプリで

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments

David Beauchemin / Richard Khoury

Large and Transformer-based language models perform outstandingly in various downstream tasks. However, there is limited understanding regarding how these models internalize linguistic knowledge, so various linguistic benchmarks have recently been proposed to facilitate syntactic evaluation of language models across languages. This paper introduces QFrCoLA (Quebec-French Corpus of Linguistic Acceptability Judgments), a normative binary acceptability judgments dataset comprising 25,153 in-domain and 2,675 out-of-domain sentences. Our study leverages the QFrCoLA dataset and seven other linguistic binary acceptability judgment corpora to benchmark seven language models. The results demonstrate that, on average, fine-tuned Transformer-based LM are strong baselines for most languages and that zero-shot binary classification large language models perform poorly on the task. However, for the QFrCoLA benchmark, on average, a fine-tuned Transformer-based LM outperformed other methods tested. It also shows that pre-trained cross-lingual LLMs selected for our experimentation do not seem to have acquired linguistic judgment capabilities during their pre-training for Quebec French. Finally, our experiment results on QFrCoLA show that our dataset, built from examples that illustrate linguistic norms rather than speakers' feelings, is similar to linguistic acceptability judgment; it is a challenging dataset that can benchmark LM on their linguistic judgment capabilities.

cs / cs.CL