Abstract
ABSTRACT This study aimed to explore how artificial intelligence (AI) tools compare with humans in evaluating the essays written by students in a writing course. Using a dataset of 30 essays written by English as a foreign language (EFL) students, the evaluations by the AI tools were compared with those of human evaluators, to examine whether the AI evaluations differed with respect to the quality of the entire essay or specific categories (i.e., content, vocabulary, organization, and accuracy). The results indicated that the AI tools provided high-quality feedback to students across all categories despite differences regarding essay quality. Additionally, AI tools differed in the scores they assigned, consistently grading lower than human raters across multiple evaluation categories while providing more detailed feedback than human raters. The scores assigned by each AI tool for student essays across various assessment categories did not differ significantly from the overall scores assigned by AI tools.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have