MARCO-ASR爆誕！専門分野に特化した音声認識モデルを作っちゃお💖

Published：2025/12/17 7:31:34

MARCO-ASR爆誕！専門分野に特化した音声認識モデルを作っちゃお💖

超要約： 音声認識モデルを賢くカスタマイズする技術！特定分野でもっと精度UPを目指すよ✨

ギャル的キラキラポイント✨

● 学習率を最適化して、モデルの学習をスムーズにするよ🎵 ● ドメイン（専門分野）ごとのデータ拡張で、音声認識の精度を爆上げ⤴ ● 色んなASRモデルに対応＆色んな言語にも使えるから最強👑

詳細解説

● 背景 ASRモデル（音声認識モデル）って、色んな言葉を聞き取れるスゴイやつ！でも、特定の分野（医療とか法律とか）の言葉は苦手なの🥺 MARCO-ASRは、その悩みを解決する為に生まれたんだ✨

● 方法学習率を調整したり、データを増やしたりして、モデルをチューニング（微調整）するよ💖専門用語とか、その分野特有の話し方にも対応できるようにするんだって！

続きは「らくらく論文」アプリで

Marco-ASR: A Principled and Metric-Driven Framework for Fine-Tuning Large-Scale ASR Models for Domain Adaptation

Xuanfan Ni / Fei Yang / Fengping Tian / Qingjuan Li / Chenyang Lyu / Yichao Du / Longyue Wang / Weihua Luo / Kaifu Zhang

Automatic Speech Recognition (ASR) models have achieved remarkable accuracy in general settings, yet their performance often degrades in domain-specific applications due to data mismatch and linguistic variability. This challenge is amplified for modern Large Language Model (LLM)-based ASR systems, whose massive scale and complex training dynamics make effective fine-tuning non-trivial. To address this gap, this paper proposes a principled and metric-driven fine-tuning framework for adapting both traditional and LLM-based ASR models to specialized domains. The framework emphasizes learning rate optimization based on performance metrics, combined with domain-specific data transformation and augmentation. We empirically evaluate our framework on state-of-the-art models, including Whisper, Whisper-Turbo, and Qwen2-Audio, across multi-domain, multilingual, and multi-length datasets. Our results not only validate the proposed framework but also establish practical protocols for improving domain-specific ASR performance while preventing overfitting.

cs / cs.SD

Arxivで見る