超要約:AIの説明、もっと分かりやすくする研究!LLM(大規模言語モデル)を使って、AIの難しい話を誰でも理解できるようにするんだって✨
✨ ギャル的キラキラポイント ✨ ● AIの説明が、まるで推し活みたいに分かりやすくなるかも!推しのこと、全部知りたいじゃん?💖 ● 難しい数式とかナシ!誰でもAIのこと、もっともっと好きになれる予感😍 ● ビジネスでも大活躍!AIの透明性(とうめいせい)が上がって、信頼度も爆上がり⤴️
詳細解説いくよ~!
背景 最近のAIモデルって、どんどん賢くなってるけど、中身はブラックボックス(真っ黒な箱)みたいになってて、何でそう予測したのか分からん!ってこと、あるよね?🤔 それを何とかしよう!ってのがこの研究の始まりだよ。XAI (説明可能なAI) って技術を使って、AIの説明を試みたけど、専門家じゃないと理解できないって問題があったんだって😢
続きは「らくらく論文」アプリで
Explainable AI (XAI) methods like SHAP and LIME produce numerical feature attributions that remain inaccessible to non expert users. Prior work has shown that Large Language Models (LLMs) can transform these outputs into natural language explanations (NLEs), but it remains unclear which factors contribute to high-quality explanations. We present a systematic factorial study investigating how Forecasting model choice, XAI method, LLM selection, and prompting strategy affect NLE quality. Our design spans four models (XGBoost (XGB), Random Forest (RF), Multilayer Perceptron (MLP), and SARIMAX - comparing black-box Machine-Learning (ML) against classical time-series approaches), three XAI conditions (SHAP, LIME, and a no-XAI baseline), three LLMs (GPT-4o, Llama-3-8B, DeepSeek-R1), and eight prompting strategies. Using G-Eval, an LLM-as-a-judge evaluation method, with dual LLM judges and four evaluation criteria, we evaluate 660 explanations for time-series forecasting. Our results suggest that: (1) XAI provides only small improvements over no-XAI baselines, and only for expert audiences; (2) LLM choice dominates all other factors, with DeepSeek-R1 outperforming GPT-4o and Llama-3; (3) we observe an interpretability paradox: in our setting, SARIMAX yielded lower NLE quality than ML models despite higher prediction accuracy; (4) zero-shot prompting is competitive with self-consistency at 7-times lower cost; and (5) chain-of-thought hurts rather than helps.