Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability

Kaixun Yang,Mladen Raković,Yuyang Li,Dragan Gašević,Quanlong Guan,Guangliang Chen

doi:10.1609/aaai.v38i20.30254

Abstract

Automatic Essay Scoring (AES) is a well-established educational pursuit that employs machine learning to evaluate student-authored essays. While much effort has been made in this area, current research primarily focuses on either (i) boosting the predictive accuracy of an AES model for a specific prompt (i.e., developing prompt-specific models), which often heavily relies on the use of the labeled data from the same target prompt; or (ii) assessing the applicability of AES models developed on non-target prompts to the intended target prompt (i.e., developing the AES models in a cross-prompt setting). Given the inherent bias in machine learning and its potential impact on marginalized groups, it is imperative to investigate whether such bias exists in current AES methods and, if identified, how it intervenes with an AES model's accuracy and generalizability. Thus, our study aimed to uncover the intricate relationship between an AES model's accuracy, fairness, and generalizability, contributing practical insights for developing effective AES models in real-world education. To this end, we meticulously selected nine prominent AES methods and evaluated their performance using seven distinct metrics on an open-sourced dataset, which contains over 25,000 essays and various demographic information about students such as gender, English language learner status, and economic status. Through extensive evaluations, we demonstrated that: (1) prompt-specific models tend to outperform their cross-prompt counterparts in terms of predictive accuracy; (2) prompt-specific models frequently exhibit a greater bias towards students of different economic statuses compared to cross-prompt models; (3) in the pursuit of generalizability, traditional machine learning models (e.g., SVM) coupled with carefully engineered features hold greater potential for achieving both high accuracy and fairness than complex neural network models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

A Comprehensive Review of Automated Essay Scoring (AES) Research and Development
Chun Then Lim ... Nung Kion Lee
Pertanika Journal of Science and Technology | VOL. 29
Chun Then Lim, et. al.Chun Then Lim ... Nung Kion Lee
31 Jul 2021
Pertanika Journal of Science and Technology | VOL. 29

Comparative Performance of Autoencoders and Traditional Machine Learning Algorithms in Clinical Data Analysis for Predicting Post-Staged GKRS Tumor Dynamics.
Simona Ruxandra Volovăț ... Cristian Constantin Volovăț
Diagnostics (Basel, Switzerland) | VOL. 14
Simona Ruxandra Volovăț, et. al.Simona Ruxandra Volovăț ... Cristian Constantin Volovăț
21 Sep 2024
Diagnostics (Basel, Switzerland) | VOL. 14

Handbook of Automated Essay Evaluation
Mark D Shermis
-
Mark D ShermisMark D Shermis
18 Jul 2013
18 Jul 2013

A crowdsourcing-based incremental learning framework for automated essays scoring
Huanyu Bai ... Siu Cheung Hui
Expert Systems with Applications | VOL. 238
Huanyu Bai, et. al.Huanyu Bai ... Siu Cheung Hui
28 Sep 2023
Expert Systems with Applications | VOL. 238

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence