Evaluating the quality of AI feedback: A comparative study of AI and human essay grading

Afnan Almegren,Hassan Saleh Mahdi,Abduljalil Nasr Hazaea,Jamal Kaid Ali,Rehan Megren Almegren

doi:10.1080/14703297.2024.2437122

Afnan Almegren, Hassan Saleh Mahdi + Show 3 more

https://doi.org/10.1080/14703297.2024.2437122

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

ABSTRACT This study aimed to explore how artificial intelligence (AI) tools compare with humans in evaluating the essays written by students in a writing course. Using a dataset of 30 essays written by English as a foreign language (EFL) students, the evaluations by the AI tools were compared with those of human evaluators, to examine whether the AI evaluations differed with respect to the quality of the entire essay or specific categories (i.e., content, vocabulary, organization, and accuracy). The results indicated that the AI tools provided high-quality feedback to students across all categories despite differences regarding essay quality. Additionally, AI tools differed in the scores they assigned, consistently grading lower than human raters across multiple evaluation categories while providing more detailed feedback than human raters. The scores assigned by each AI tool for student essays across various assessment categories did not differ significantly from the overall scores assigned by AI tools.

Full Text