ViT（画像認識モデル）の学習、限界あるよ～！😭 でも希望はある✨

Published：2025/12/3 15:11:44

ViT（画像認識モデル）の学習、限界あるよ～！😭 でも希望はある✨

1. 超要約 ViTってスゴイけど、学習効果には限界が…！😱 でも小規模モデルなら、賢く使えるかも？💖

2. ギャル的キラキラポイント✨

● 大量のデータなくても、ViTで画像認識できるかもって話！😳
● 無駄な学習、しなくて済む方法が見つかるかも！節約！💰
● IT企業が、画像認識を使いやすくなるチャンス到来😍

3. 詳細解説

続きは「らくらく論文」アプリで

Diminishing Returns in Self-Supervised Learning

Oli Bridge / Huey Sun / Botond Branyicskai-Nagy / Charles D'Ornano / Shomit Basu

While transformer-based architectures have taken computer vision and NLP by storm, they often require a vast amount of parameters and training data to attain strong performance. In this work, we experiment with three distinct pre-training, intermediate fine-tuning, and downstream datasets and training objectives to explore their marginal benefits on a small 5M-parameter vision transformer. We find that while pre-training and fine-tuning always help our model but have diminishing returns, intermediate fine-tuning can actually show harmful impact on downstream performance, potentially due to dissimilarity in task mechanics. Taken together, our results suggest that small-scale ViTs benefit most from targeted pre-training and careful data selection, while indiscriminate stacking of intermediate tasks can waste compute and even degrade performance.

cs / cs.CV

Arxivで見る