Automated Essay Scoring Using Transformer Models

Sabrina Ludwig,Christopher Hansen,Steffen Brandt,Kerstin Eilers,Christian Mayer

doi:10.3390/psych3040056

Sabrina Ludwig, Christopher Hansen + Show 3 more

Open Access

PDF Available

https://doi.org/10.3390/psych3040056

Copy DOI

Export

Save

Cite

Journal: Psych	Publication Date: Dec 14, 2021
Citations: 22	License type: CC BY 4.0

Affiliation: University of Mannheim, Kiel University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Automated essay scoring (AES) is gaining increasing attention in the education sector as it significantly reduces the burden of manual scoring and allows ad hoc feedback for learners. Natural language processing based on machine learning has been shown to be particularly suitable for text classification and AES. While many machine-learning approaches for AES still rely on a bag of words (BOW) approach, we consider a transformer-based approach in this paper, compare its performance to a logistic regression model based on the BOW approach, and discuss their differences. The analysis is based on 2088 email responses to a problem-solving task that were manually labeled in terms of politeness. Both transformer models considered in the analysis outperformed without any hyperparameter tuning of the regression-based model. We argue that, for AES tasks such as politeness classification, the transformer-based approach has significant advantages, while a BOW approach suffers from not taking word order into account and reducing the words to their stem. Further, we show how such models can help increase the accuracy of human raters, and we provide a detailed instruction on how to implement transformer-based models for one’s own purposes.

Highlights

Recent developments in natural language processing (NLP) and the progress in machine learning (ML) algorithms have opened the door to new approaches within the educational sector in general and the measurement of student performance, in particular
long short-term memory (LSTM) model significantly outperforms the two baseline models, based on support vector regression and Bayesian linear ridge regression, while outperforming models based on just the LSTM, the CNN, or a gated recurrent units (GRU)
They show that the transformer-based approaches yield results comparable to that of a model combined of LSTM and CNN

Summary

Introduction

Recent developments in natural language processing (NLP) and the progress in machine learning (ML) algorithms have opened the door to new approaches within the educational sector in general and the measurement of student performance, in particular. Intelligent tutoring systems, plagiarism-detecting software, or helpful chatbots are just a few examples of how ML is currently used to support learners and teachers [1]. An important part of providing personalized feedback and supporting students is automated essay scoring (AES), in which algorithms are implemented to classify long text answers in accordance with classifications by human raters [2]. We implement AES using current state of the art language models based on neural networks with a transformer architecture [3,4]. We want to explore the following two main questions: Licensee MDPI, Basel, Switzerland.

Methods

Results

Discussion

Conclusion