Countering the Influence of Essay Length in Neural Essay Scoring

Sungho Jeon,Michael Strube

doi:10.18653/v1/2021.sustainlp-1.4

Abstract

Previous work has shown that automated essay scoring systems, in particular machine learning-based systems, are not capable of assessing the quality of essays, but are relying on essay length, a factor irrelevant to writing proficiency. In this work, we first show that state-of-the-art systems, recent neural essay scoring systems, might be also influenced by the correlation between essay length and scores in a standard dataset. In our evaluation, a very simple neural model shows the state-of-the-art performance on the standard dataset. To consider essay content without taking essay length into account, we introduce a simple neural model assessing the similarity of content between an input essay and essays assigned different scores. This neural model achieves performance comparable to the state of the art on a standard dataset as well as on a second dataset. Our findings suggest that neural essay scoring systems should consider the characteristics of datasets to focus on text quality.

Highlights

Introduction of English as a ForeignLanguage dataset (TOEFL, Blanchard et al (2013)), which has a lower corre-Automated essay scoring (AES) is the task of as- lation between essay length and scores.signing a score for a given essay, aiming to replicate Second, we demonstrate that considering essay human scoring results
AES systems are not capable of assessing the qual- We demonstrate that this neural model achieves ity of essays (Winerip, 2005; Ben-Simon and Ben- performance comparable to the state of the art on nett, 2007; Wolfe et al, 2016), but work both datasets
Automated Student Assessment Prize (ASAP), we view this as evidence that the performance of previous neural models might be influenced by the correlation of essay length and scores in the target dataset

Summary

Introduction

Introduction of English as a ForeignLanguage dataset (TOEFL, Blanchard et al (2013)), which has a lower corre-Automated essay scoring (AES) is the task of as- lation between essay length and scores.signing a score for a given essay, aiming to replicate Second, we demonstrate that considering essay human scoring results. Essay length and scores in the standard dataset leads to top performance. Recent neural essay scoring systems, which do not employ a feature capturing essay length explicitly, achieve state-of-the-art performance.

Results

Conclusion