iconLogo
Published:2025/12/23 18:05:55

キュートなNNで球を攻略!🚀💕

低次数の球面多項式を爆速学習!

✨ ギャル的キラキラポイント ✨ ● 2層NN(ニューラルネットワーク)で効率UP! ● チャネルアテンション機構(仕組み)がカワイイ💕 ● サンプル数(データ量)少なくてもOK!

詳細解説いくよ~!

背景 データ解析(データかいせき)の世界では、球みたいな形の上にあるデータを扱うことって多いんだよね!例えば、地球🌏とか、3D画像とか!でも、それをAI(エーアイ)に学習させようとすると、データ量がめっちゃ必要だったり、うまく学習できなかったり…って問題があったの😢

続きは「らくらく論文」アプリで

Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention

Yingzhen Yang

We study the problem of learning a low-degree spherical polynomial of degree $\ell_0 = \Theta(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network (NN) with channel attention in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0,1)$, a carefully designed two-layer NN with channel attention and finite width of $m \ge \Theta({n^4 \log (2n/\delta)}/{d^{2\ell_0}})$ trained by the vanilla gradient descent (GD) requires the lowest sample complexity of $n \asymp \Theta(d^{\ell_0}/\eps)$ with probability $1-\delta$ for every $\delta \in (0,1)$, in contrast with the representative sample complexity $\Theta\pth{d^{\ell_0} \max\set{\eps^{-2},\log d}}$, where $n$ is the training daata size. Moreover, such sample complexity is not improvable since the trained network renders a sharp rate of the nonparametric regression risk of the order $\Theta(d^{\ell_0}/{n})$ with probability at least $1-\delta$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $\Theta(d^{\ell_0})$ is $\Theta(d^{\ell_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GD is minimax optimal. The training of the two-layer NN with channel attention consists of two stages. In Stage 1, a provable learnable channel selection algorithm identifies the ground-truth channel number $\ell_0$ from the initial $L \ge \ell_0$ channels in the first-layer activation, with high probability. This learnable selection is achieved by an efficient one-step GD update on both layers, enabling feature learning for low-degree polynomial targets. In Stage 2, the second layer is trained by standard GD using the activation function with the selected channels.

cs / stat.ML / cs.LG / math.OC