Abstract

Automated Essay Scoring (AES) is a critical text regression task that automatically assigns scores to essays based on their writing quality. Recently, the performance of sentence prediction tasks has been largely improved by using Pre-trained Language Models via fusing representations from different layers, constructing an auxiliary sentence, using multi-task learning, etc. However, to solve the AES task, previous works utilize shallow neural networks to learn essay representations and constrain calculated scores with regression loss or ranking loss, respectively. Since shallow neural networks trained on limited samples show poor performance to capture deep semantic of texts. And without an accurate scoring function, ranking loss and regression loss measures two different aspects of the calculated scores. To improve AES’s performance, we find a new way to fine-tune pre-trained language models with multiple losses of the same task. In this paper, we propose to utilize a pre-trained language model to learn text representations first. With scores calculated from the representations, mean square error loss and the batch-wise ListNet loss with dynamic weights constrain the scores simultaneously. We utilize Quadratic Weighted Kappa to evaluate our model on the Automated Student Assessment Prize dataset. Our model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model. Especially on the two narrative prompts, our model performs much better than all other state-of-the-art models.

Highlights

  • Automated Essay Scoring (AES) automatically evaluates the writing quality of essays

  • Before introducing our new way to use pre-trained language models, we briefly review existing works in AES firstly

  • With the measurement of Quadratic Weighted Kappa (QWK), our model outperforms state-of-the-art neural models on average QWK score of all eight prompts near 3 percent and performs better than the latest statistical model

Read more

Summary

Introduction

Automated Essay Scoring (AES) automatically evaluates the writing quality of essays. Essay assignments evaluation costs lots of time. Have shown the extraordinary ability of representation and generalization These models have gained better performance in lots of downstream tasks such as text classification and regression. Sun et al (2019b) summarized several fine-tuning methods, including fusing text representations from different layers, utilizing multi-task learning, etc. Existing works utilize different methods to learn text representations and constrain scores, which are the two key steps in AES models. One is how to learn better essay representations to evaluate the writing quality, the other one is how to learn a more accurate score mapping function. We propose a new method called multi-loss to fine-tune BERT models in AES tasks. To show the effectiveness of self-attention in the BERT model, we illustrate the weights of different words on two examples, including one argumentative essay and one narrative essay

Related Works
R2BERT
Self-attention
Feature Extraction
Regression
Batchwise Learning to Rank Model
Combination of Regression and Ranking
Experiment
Dataset
Experiment Settings
Evaluation Metric
Average
Baselines and Implementation Details
Experiment Results and Analysis
Runtime and Memory
Conclusion and Future works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call