VFLでデータ価値を見抜く方法？！💖

Published：2025/12/16 8:01:39

最強ギャルAIが解説！VFLでデータ価値を見抜く方法だって？！💖

タイトル & 超要約 VFL（秘密のデータ学習）で、データの価値を安全に評価する技術だよ！✨
ギャル的キラキラポイント
- ● 秘密を守ってデータの貢献度を測れるって、すごい！😎
- ● モデルを作る前（準備段階）で、データの価値がわかるって神！✨
- ● データ提供者も、ちゃんと評価されてハッピーだね💕
詳細解説
- 背景複数の会社が、秘密のデータを共有せずにAIを学習する「垂直型連合学習（VFL）」がアツい！でも、どのデータがどれだけ貢献してるか、分かんないじゃん？🤔 これを解決するのが今回の研究だよ！
- 方法「Shapley-CMI」って方法を使って、データの貢献度を計算するの。秘密を守るために、PSI（秘密の計算サーバー）を使って、安全に計算するんだって！賢すぎ！💖
- 結果 Shapley-CMIの結果と、ほぼ同じ精度が出たみたい！つまり、安全にデータの価値を測れるってこと！やったね！🥳
- 意義（ここがヤバい♡ポイント） モデルを作る前に、どのデータが重要か分かるから、無駄なく良いモデルが作れる！データ提供者も「私のデータ、すごいでしょ！」って自信持てるし、みんなハッピーだね♪😆
リアルでの使いみちアイデア
- 💡 金融機関（お金のプロ）とECサイト（ネット通販）が協力して、悪い人を見つけるAIを作れるかも！✨
- 💡 病院🏥と製薬会社💊が協力して、病気のリスクを予測するAIを作れば、みんな健康になれるね！

続きは「らくらく論文」アプリで

Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation

Unai Laskurain / Aitor Aguirre-Ortuzar / Urko Zurutuza

Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. In Vertical FL (VFL), where each party holds different features for the same users, a key challenge is to evaluate the feature contribution of each party before any model is trained, particularly in the early stages when no model exists. To address this, the Shapley-CMI method was recently proposed as a model-free, information-theoretic approach to feature valuation using Conditional Mutual Information (CMI). However, its original formulation did not provide a practical implementation capable of computing the required permutations and intersections securely. This paper presents a novel privacy-preserving implementation of Shapley-CMI for VFL. Our system introduces a private set intersection (PSI) server that performs all necessary feature permutations and computes encrypted intersection sizes across discretized and encrypted ID groups, without the need for raw data exchange. Each party then uses these intersection results to compute Shapley-CMI values, computing the marginal utility of their features. Initial experiments confirm the correctness and privacy of the proposed system, demonstrating its viability for secure and efficient feature contribution estimation in VFL. This approach ensures data confidentiality, scales across multiple parties, and enables fair data valuation without requiring the sharing of raw data or training models.

cs / cs.CR / cs.AI / cs.DC

Arxivで見る