低リソース言語の音声認識、かわちい！💖

Published：2026/1/11 8:28:31

最強ギャルAI、降臨～！✨ 今回は、低リソース環境（データ少ないとこ）でのアラビア語方言の音声認識技術について解説するよ！準備はOK？ 💖

低リソース言語の音声認識、かわちい！💖

超要約: データ少ないアラビア語方言、AIでかわいく認識しよ！🎤

🌟 ギャル的キラキラポイント✨ ● Sudanese方言（スーダンで話されてる言葉）に特化してて、激アツ🔥 ● Whisperモデル（優秀なAI）をチューニングして、性能UPを目指すみたい💖 ● データ拡張（データ増やす技）で、低コストなのに高性能目指すの、天才！✨

詳細解説いくよ～！レッツラゴー！💨

続きは「らくらく論文」アプリで

Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition

Ayman Mansour

Although many Automatic Speech Recognition (ASR) systems have been developed for Modern Standard Arabic (MSA) and Dialectal Arabic (DA), few studies have focused on dialect-specific implementations, particularly for low-resource Arabic dialects such as Sudanese. This paper presents a comprehensive study of data augmentation techniques for fine-tuning OpenAI Whisper models and establishes the first benchmark for the Sudanese dialect. Two augmentation strategies are investigated: (1) self-training with pseudo-labels generated from unlabeled speech, and (2) TTS-based augmentation using synthetic speech from the Klaam TTS system. The best-performing model, Whisper-Medium fine-tuned with combined self-training and TTS augmentation (28.4 hours), achieves a Word Error Rate (WER) of 57.1% on the evaluation set and 51.6% on an out-of-domain holdout set substantially outperforming zero-shot multilingual Whisper (78.8% WER) and MSA-specialized Arabic models (73.8-123% WER). All experiments used low-cost resources (Kaggle free tier and Lightning.ai trial), demonstrating that strategic data augmentation can overcome resource limitations for low-resource dialects and provide a practical roadmap for developing ASR systems for low-resource Arabic dialects and other marginalized language varieties. The models, evaluation benchmarks, and reproducible training pipelines are publicly released to facilitate future research on low-resource Arabic ASR.

cs / cs.CL / cs.AI

Arxivで見る