LLMs in Automated Essay Evaluation: A Case Study

Milan Kostic,Hans Friedrich Witschel,Maja Spahic-Bogdanovic,Knut Hinkelmann

doi:10.1609/aaaiss.v3i1.31193

Abstract

This study delves into the application of large language models (LLMs), such as ChatGPT-4, for the automated evaluation of student essays, with a focus on a case study conducted at the Swiss Institute of Business Administration. It explores the effectiveness of LLMs in assessing German-language student transfer assignments, and contrasts their performance with traditional evaluations by human lecturers. The primary findings highlight the challenges faced by LLMs in terms of accurately grading complex texts according to predefined categories and providing detailed feedback. This research illuminates the gap between the capabilities of LLMs and the nuanced requirements of student essay evaluation. The conclusion emphasizes the necessity for ongoing research and development in the area of LLM technology to improve the accuracy, reliability, and consistency of automated essay assessments in educational contexts.

Full Text