超要約:色んなAIモデルが、物質の表現で仲良くまとまるってコト💖
✨ ギャル的キラキラポイント ✨ ● 色んなAIモデルが、同じような物質の表現を学習するって、なんかエモくない?🥺 ● 科学の世界でも、AIちゃんが活躍しまくってるって、激アツじゃん!🔥 ● 新薬開発とか、材料設計が超絶進化する未来、楽しみしかないっしょ!🚀
詳細解説 ● 背景 最先端のAIモデル(FMs)は、科学の世界でも大活躍中!色んな分野で使えるけど、モデルによってデータの扱い方とか得意なことって違うじゃん?でも、どんなモデルも、物質のこととかを同じように理解してたら、すごい便利じゃん?ってのが今回の研究なの。
● 方法 色んなFMsを使って、物質の表現(分子とか材料とか)が、モデルのタイプとか学習データに関係なく、似たような構造になるか調べたんだって!難しい指標を使って、表現の「似てる度合い」を測ったり、モデルがどう進化してるか見たりしたみたい。
続きは「らくらく論文」アプリで
Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they learn similar internal representations of matter. Understanding their latent structure is essential for building scientific foundation models that generalize reliably beyond their training domains. Although representational convergence has been observed in language and vision, its counterpart in the sciences has not been systematically explored. Here, we show that representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems. Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality. We then show two distinct regimes of scientific models: on inputs similar to those seen during training, high-performing models align closely and weak models diverge into local sub-optima in representation space; on vastly different structures from those seen during training, nearly all models collapse onto a low-information representation, indicating that today's models remain limited by training data and inductive bias and do not yet encode truly universal structure. Our findings establish representational alignment as a quantitative benchmark for foundation-level generality in scientific models. More broadly, our work can track the emergence of universal representations of matter as models scale, and for selecting and distilling models whose learned representations transfer best across modalities, domains of matter, and scientific tasks.