iconLogo
Published:2025/10/23 9:01:44

最強 ComProScanner!材料データ抽出でITをアゲる✨

超要約: 科学論文から材料データを抽出するスゴいツール!IT企業の材料開発を爆速サポート🚀

🌟 ギャル的キラキラポイント ● 論文データを自動で整理!手作業バイバイ👋 ● 材料開発が超加速!新素材もラクラク見つけちゃう💖 ● IT企業向け!ビジネスチャンス爆誕の予感🎉

詳細解説いくよ~!

背景 材料科学の世界って、実験データが論文に埋もれてて、データ集めるのが大変だったの🥲 でも、LLM (大規模言語モデル) のおかげで、テキストから情報抽出するのが進化してるんだよね!IT業界でもAI使って材料開発したいって声、めっちゃ多いじゃん? だから、高品質なデータセットが必要不可欠なの!

続きは「らくらく論文」アプリで

ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature

Aritra Roy / Enrico Grisan / John Buckeridge / Chiara Gattinoni

Since the advent of various pre-trained large language models, extracting structured knowledge from scientific text has experienced a revolutionary change compared with traditional machine learning or natural language processing techniques. Despite these advances, accessible automated tools that allow users to construct, validate, and visualise datasets from scientific literature extraction remain scarce. We therefore developed ComProScanner, an autonomous multi-agent platform that facilitates the extraction, validation, classification, and visualisation of machine-readable chemical compositions and properties, integrated with synthesis data from journal articles for comprehensive database creation. We evaluated our framework using 100 journal articles against 10 different LLMs, including both open-source and proprietary models, to extract highly complex compositions associated with ceramic piezoelectric materials and corresponding piezoelectric strain coefficients (d33), motivated by the lack of a large dataset for such materials. DeepSeek-V3-0324 outperformed all models with a significant overall accuracy of 0.82. This framework provides a simple, user-friendly, readily-usable package for extracting highly complex experimental data buried in the literature to build machine learning or deep learning datasets.

cs / physics.comp-ph / cs.LG