最強ギャルAI、降臨~!✨ 今回は、低リソース環境(データ少ないとこ)でのアラビア語方言の音声認識技術について解説するよ! 準備はOK? 💖
超要約: データ少ないアラビア語方言、AIでかわいく認識しよ!🎤
🌟 ギャル的キラキラポイント✨ ● Sudanese方言(スーダンで話されてる言葉)に特化してて、激アツ🔥 ● Whisperモデル(優秀なAI)をチューニングして、性能UPを目指すみたい💖 ● データ拡張(データ増やす技)で、低コストなのに高性能目指すの、天才!✨
詳細解説いくよ~!レッツラゴー!💨
続きは「らくらく論文」アプリで
Although many Automatic Speech Recognition (ASR) systems have been developed for Modern Standard Arabic (MSA) and Dialectal Arabic (DA), few studies have focused on dialect-specific implementations, particularly for low-resource Arabic dialects such as Sudanese. This paper presents a comprehensive study of data augmentation techniques for fine-tuning OpenAI Whisper models and establishes the first benchmark for the Sudanese dialect. Two augmentation strategies are investigated: (1) self-training with pseudo-labels generated from unlabeled speech, and (2) TTS-based augmentation using synthetic speech from the Klaam TTS system. The best-performing model, Whisper-Medium fine-tuned with combined self-training and TTS augmentation (28.4 hours), achieves a Word Error Rate (WER) of 57.1% on the evaluation set and 51.6% on an out-of-domain holdout set substantially outperforming zero-shot multilingual Whisper (78.8% WER) and MSA-specialized Arabic models (73.8-123% WER). All experiments used low-cost resources (Kaggle free tier and Lightning.ai trial), demonstrating that strategic data augmentation can overcome resource limitations for low-resource dialects and provide a practical roadmap for developing ASR systems for low-resource Arabic dialects and other marginalized language varieties. The models, evaluation benchmarks, and reproducible training pipelines are publicly released to facilitate future research on low-resource Arabic ASR.