大規模データ学習を爆速化！ASSSって最強🔥

Published：2026/1/5 13:10:09

大規模データ学習を爆速化！ASSSって最強🔥

超要約：AIの学習を賢く時短！データ選びの神テク✨

✨ ギャル的キラキラポイント ✨

● データ選びをAIがしてくれる！まるでパーソナルショッパー💅 ● 学習時間短縮で、あなたの推しAIが爆速成長🚀 ● 無駄なデータはポイ！無駄を省いて効率UP⤴

詳細解説いくよ～！

続きは「らくらく論文」アプリで

A Differentiable Adversarial Framework for Task-Aware Data Subsampling

Jiacheng Lyu / Bihua Bao

The proliferation of large-scale datasets poses a major computational challenge to model training. The traditional data subsampling method works as a static, task independent preprocessing step which usually discards information that is critical to downstream prediction. In this paper, we introduces the antagonistic soft selection subsampling (ASSS) framework as is a novel paradigm that reconstructs data reduction into a differentiable end-to-end learning problem. ASSS uses the adversarial game between selector network and task network, and selector network learning assigns continuous importance weights to samples. This direct optimization implemented by Gumbel-Softmax relaxation allows the selector to identify and retain samples with the maximum amount of information for a specific task target under the guidance of the loss function that balances the fidelity and sparsity of the prediction. Theoretical analysis links this framework with the information bottleneck principle. Comprehensive experiments on four large-scale real world datasets show that ASSS has always been better than heuristic subsampling baselines such as clustering and nearest neighbor thinning in maintaining model performance. It is worth noting that ASSS can not only match, but also sometimes exceed the training performance of the entire dataset, showcasing the effect of intelligent denoising. This work establishes task aware data subsampling as a learnable component, providing a principled solution for effective large-scale data learning.

cs / cs.LG

Arxivで見る