This study explores how well an automated essay grading (AEG) system, built with Natural Language Processing (NLP), aligns with human graders in assessing different types of essays. Using essays from 35 information technology (IT) students manually scored by human raters, the system’s performance was evaluated with statistical tools like weighted Cohen’s Kappa and the Friedman test. The results showed a moderate to substantial match between the automated essay grading system and human scores across essay types (argumentative, comparison and contrast, descriptive, narrative, and persuasive, suggesting that the system can reliably handle various writing styles. Also, no significant differences in grading reliability were found across essay formats, indicating that the system adapts well to different types of essays. These findings suggest that NLP-based essay grading systems could be valuable in educational settings, especially in large classes with high grading demands. While the system shows strong potential in the academic arena, focusing on assessment and further testing with a broader dataset is recommended to improve its generalizability and address complex writing elements, such as creativity and tone. This study contributes to educational technology by presenting a practical, scalable approach to consistent and objective essay grading, paving the way for the broader use of automated grading tools in education.
Read full abstract