iconLogo
Published:2026/1/8 15:11:04

予測エントロピーで音声分類が神進化☆(超要約:音声分類の精度爆上げ!)

1. ギャル的キラキラポイント✨ ● 音声分類をめっちゃ賢くする研究だよ! ● プロンプト(指示文)の扱い方が超重要ってこと! ● データが少なくても、精度アップできるのがスゴすぎ!

2. 詳細解説 背景 音声で色々できるAI、すごいよね! でも、指示文(プロンプト)の書き方で精度が変わっちゃうのが悩みだったの😢

*方法*
予測の「不確実さ」(エントロピー)を小さくするように、プロンプトに重み付けしたんだって! どのプロンプトが役に立つか見抜く感じ?🧐

*結果*
なんと! 精度が上がったんだって! しかも、少ないデータでもOK!✨ いろんな音声に使えるから、マジ卍!

*意義(ここがヤバい♡ポイント)*
ラベル(正解)データが少なくても、音声認識の精度が上がるって神じゃん?💖 いろんなサービスに応用できるから、未来が楽しみだね!

続きは「らくらく論文」アプリで

Leveraging Prediction Entropy for Automatic Prompt Weighting in Zero-Shot Audio-Language Classification

Karim El Khoury / Maxime Zanella / Tiffanie Godelaine / Christophe De Vleeschouwer / Benoit Macq

Audio-language models have recently demonstrated strong zero-shot capabilities by leveraging natural-language supervision to classify audio events without labeled training data. Yet, their performance is highly sensitive to the wording of text prompts, with small variations leading to large fluctuations in accuracy. Prior work has mitigated this issue through prompt learning or prompt ensembling. However, these strategies either require annotated data or fail to account for the fact that some prompts may negatively impact performance. In this work, we present an entropy-guided prompt weighting approach that aims to find a robust combination of prompt contributions to maximize prediction confidence. To this end, we formulate a tailored objective function that minimizes prediction entropy to yield new prompt weights, utilizing low-entropy as a proxy for high confidence. Our approach can be applied to individual samples or a batch of audio samples, requiring no additional labels and incurring negligible computational overhead. Experiments on five audio classification datasets covering environmental, urban, and vocal sounds, demonstrate consistent gains compared to classical prompt ensembling methods in a zero-shot setting, with accuracy improvements 5-times larger across the whole benchmark.

cs / cs.SD / cs.LG