iconLogo
Published:2025/12/4 2:26:56

え、ラベル欠損が宝💎? 半教師あり学習、イケてる使い方教えちゃう!

超要約: ラベルが足りないデータでも、欠損(データがないこと)に意味を見出し、AIの精度爆上げしちゃう方法だよ!

✨ ギャル的キラキラポイント ✨

  • ● ラベルが少ないデータでも、その欠損から情報を得て精度UP! 賢すぎ💖
  • ● 医療とか自動運転とか、データ集めるのが大変な分野で大活躍の予感😍
  • ● ビジネスチャンスがめっちゃ広がる! 新しいサービスとか作れちゃうかも✨

詳細解説いくねー!

続きは「らくらく論文」アプリで

Informative missingness and its implications in semi-supervised learning

Jinran Wu / You-Gan Wang / Geoffrey J. McLachlan

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed features or the class labels or both, the missingness indicators themselves contain useful information. In certain situations, the information gained from modelling the missing-label mechanism can even outweigh the loss due to missing labels, yielding a classifier with a smaller expected error than one based on a completely labelled sample analysed. This improvement arises particularly when class overlap is moderate, labelled data are sparse, and the missingness is informative. Modelling such informative missingness thus offers a coherent statistical framework that unifies likelihood-based inference with the behaviour of empirical SSL methods.

cs / stat.ML / cs.LG