キッズ医療AI、爆誕！👶✨ ベンチマークで精度UP！

Published：2025/8/25 6:52:57

キッズ医療AI、爆誕！👶✨ ベンチマークで精度UP！

超要約: 子供向け医療AIを良くする、新しいテスト「PediatricsMQA」が登場！年齢バイアスをなくして、公平なAIを目指すよ💖

ギャル的キラキラポイント✨

● 子供向け医療AI、まだ発展途上だったの🥺？ ● 年齢とか発達段階に合わせた情報、大事だよね～！💯 ● ベンチマーク公開で、みんなで研究できるのがスゴい💖

詳細解説

続きは「らくらく論文」アプリで

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

Adil Bahaj / Mohamed Chetouani / Mounir Ghogho

Large language models (LLMs) and vision-augmented LLMs (VLMs) have significantly advanced medical informatics, diagnostics, and decision support. However, these models exhibit systematic biases, particularly age bias, compromising their reliability and equity. This is evident in their poorer performance on pediatric-focused text and visual question-answering tasks. This bias reflects a broader imbalance in medical research, where pediatric studies receive less funding and representation despite the significant disease burden in children. To address these issues, a new comprehensive multi-modal pediatric question-answering benchmark, PediatricsMQA, has been introduced. It consists of 3,417 text-based multiple-choice questions (MCQs) covering 131 pediatric topics across seven developmental stages (prenatal to adolescent) and 2,067 vision-based MCQs using 634 pediatric images from 67 imaging modalities and 256 anatomical regions. The dataset was developed using a hybrid manual-automatic pipeline, incorporating peer-reviewed pediatric literature, validated question banks, existing benchmarks, and existing QA resources. Evaluating state-of-the-art open models, we find dramatic performance drops in younger cohorts, highlighting the need for age-aware methods to ensure equitable AI support in pediatric care.

cs / cs.CY / cs.AI / cs.CL / cs.GR / cs.MM

Arxivで見る