タイトル & 超要約(15字以内) 手書き図面を採点するAI「SketchJudge」爆誕!💯
ギャル的キラキラポイント✨ ×3 ● 手書き図面のエラーをAIが診断する時代!😳 ● STEM教育(理系教育)をAIでサポート💖 ● AIが図面を理解して、業務効率UPも夢じゃない🌟
詳細解説
リアルでの使いみちアイデア💡 ×2 ● 宿題で書いた図面をAI先生が秒速で採点!間違いを教えてくれるアプリ📱 ● CADソフトにAIが搭載されて、手書きのラフ図から設計図を自動生成!💻
続きは「らくらく論文」アプリで
While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in visual understanding, they often struggle when faced with the unstructured and ambiguous nature of human-generated sketches. This limitation is particularly pronounced in the underexplored task of visual grading, where models should not only solve a problem but also diagnose errors in hand-drawn diagrams. Such diagnostic capabilities depend on complex structural, semantic, and metacognitive reasoning. To bridge this gap, we introduce SketchJudge, a novel benchmark tailored for evaluating MLLMs as graders of hand-drawn STEM diagrams. SketchJudge encompasses 1,015 hand-drawn student responses across four domains: geometry, physics, charts, and flowcharts, featuring diverse stylistic variations and distinct error types. Evaluations on SketchJudge demonstrate that even advanced MLLMs lag significantly behind humans, validating the benchmark's effectiveness in exposing the fragility of current vision-language alignment in symbolic and noisy contexts. All data, code, and evaluation scripts are publicly available at https://github.com/yuhangsu82/SketchJudge.