長文エッセイの採点、AIで爆速＆激カワに！✨

Published：2026/1/7 2:01:27

長文エッセイの採点、AIで爆速＆激カワに！✨

超要約：長文エッセイ（作文）をAIが採点する技術を研究したよ！

● AIが長文（ちょうぶん）エッセイを採点（さいてん）してくれるって、超便利じゃん？😍 ● 色んなAIモデルを組み合わせて、より正確（せいかく）に採点できるようにしたんだって！😎 ● IT企業が教育（きょういく）とかコンテンツ作成（さくせい）に役立てられるね！💖

詳細解説

背景長文エッセイの採点って、先生たち大変じゃん？😫 時間かかるし、大変だし…。それをAIがやってくれたら、めっちゃ楽になるよね！✨ この研究は、そのための技術を開発したってわけ💖

続きは「らくらく論文」アプリで

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Kuo Wang (Southern Methodist University) / Haowei Hua (Princeton University) / Pengfei Yan (University of Maryland) / Hong Jiao (University of Maryland) / Dan Song (University of Iowa)

Long context may impose challenges for encoder-only language models in text processing, specifically for automated scoring of essays. This study trained several commonly used encoder-based language models for automated scoring of long essays. The performance of these trained models was evaluated and compared with the ensemble models built upon the base language models with a token limit of 512?. The experimented models include BERT-based models (BERT, RoBERTa, DistilBERT, and DeBERTa), ensemble models integrating embeddings from multiple encoder models, and ensemble models of feature-based supervised machine learning models, including Gradient-Boosted Decision Trees, eXtreme Gradient Boosting, and Light Gradient Boosting Machine. We trained, validated, and tested each model on a dataset of 17,307 essays, with an 80%/10%/10% split, and evaluated model performance using Quadratic Weighted Kappa. This study revealed that an ensemble-of-embeddings model that combines multiple pre-trained language model representations with gradient-boosting classifier as the ensemble model significantly outperforms individual language models at scoring long essays.

cs / cs.CL / cs.LG

Arxivで見る