決定木vs.ニューラルネット！AIの秘密を解き明かす💘

Published：2026/1/8 13:31:51

決定木vs.ニューラルネット！AIの秘密を解き明かす💘

超要約: ニューラルネットの「見えづらい」部分を、決定木みたいに分かりやすくする研究だよ✨

ギャル的キラキラポイント✨ ● 決定木（シンプルで分かりやすいAI）とニューラルネット（高性能だけど謎）を比較してる！ ● ニューラルネットがなんで難しいのか、RTV（幾何学的複雑さ）って指標で解明する！ ● AIの「なんで？」を説明できるようにして、色んな分野で役立てる方法を提案してるの！

詳細解説 ● 背景最近のAIブームで、ニューラルネットはスゴイけど、中身はブラックボックス🗃️。「なんでそう判断したの？」ってのが分かんないから、医療とか金融とか、大事なとこでは使いにくい…😭 決定木みたいに、誰にでも分かるAIが求められてるんだよね！

● 方法ニューラルネットの複雑さを測る指標として、RTV（Radon Total Variation）ってのを使うよ！📏 RTVで、ニューラルネットがどれだけ複雑か「見える化」するんだね！決定木みたいにシンプルな決定境界（けっていきょうかい = どこで判断を変えるかの線）に近づけるには、RTVを小さくすれば良いってこと！

続きは「らくらく論文」アプリで

A Gap Between Decision Trees and Neural Networks

Akash Kumar

We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation ($\mathrm{R}\mathrm{TV}$) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator $1_A$ has infinite $\mathrm{R}\mathrm{TV}$. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite $\mathrm{R}\mathrm{TV}$ in dimensions $d>1$, while Gaussian convolution yields finite $\mathrm{R}\mathrm{TV}$ but with an explicit exponential dependence on $d$. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to $1_A$). For classification, we construct a smooth barrier score $S_A$ with finite $\mathrm{R}\mathrm{TV}$ whose fixed threshold $\tau=1$ exactly recovers the box. Under a mild tube-mass condition near $\partial A$, we prove an $L_1(P)$ calibration bound that decays polynomially in a sharpness parameter, along with an explicit $\mathrm{R}\mathrm{TV}$ upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.

cs / cs.LG / cs.AI / stat.ML

Arxivで見る