深層学習の学習、パワーロー則の秘密を解き明かす！✨

Published：2025/12/25 19:43:48

深層学習の学習、パワーロー則の秘密を解き明かす！✨

超要約: 深層学習の学習がパワーロー則に従う秘密を、新しい枠組みで解明したんだって！

ギャル的キラキラポイント ● 学習をエネルギーの移動って捉える発想がエモい！🤩 ● パワーロー則の条件を4つも発見！天才！👏 ● IT業界の課題解決に貢献できるって、すごくない！？😳

詳細解説 背景深層学習（ディープラーニング）の学習って、モデルのサイズとかデータ量によって、パフォーマンス（性能）が「パワーロー則」っていうルールに従うことが多いんだよね。でも、なんでそうなるのか、詳しくは分かってなかったの！ 🤔

方法そこで、この研究では「GRSD（一般化された解像度シェルダイナミクス）」っていう新しい枠組みを使って、学習のプロセスをモデル化！✨ 学習をエネルギーが移動する様子として捉えたんだって！

続きは「らくらく論文」アプリで

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics

Yizhou Zhang

Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.

cs / cs.LG / cs.AI

Arxivで見る