Abstract

Existing approaches for automated essay scoring and document representation learning typically rely on discourse parsers to incorporate discourse structure into text representation. However, the performance of parsers is not always adequate, especially when they are used on noisy texts, such as student essays. In this paper, we propose an unsupervised pre-training approach to capture discourse structure of essays in terms of coherence and cohesion that does not require any discourse parser or annotation. We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. Our proposed unsupervised approach achieves a new state-of-the-art result on the task of essay Organization scoring.

Highlights

  • A UTOMATED Essay Scoring (AES), the task of both grading and evaluating written essays using machine learning techniques, is an important educational application of natural language processing (NLP)

  • In this paper, we extend our previous research by introducing new corruption techniques and by enhancing a document encoder with our Discourse Corruption (DC) pre-training to capture discourse structure of essay Organization

  • We anticipate that since we do not change the position of the discourse indicators (DIs) during shuffling, the encoder might learn only the sequence of DIs within each essay and try to distinguish between the DI sequence of original and corrupted essays

Read more

Summary

Introduction

A UTOMATED Essay Scoring (AES), the task of both grading and evaluating written essays using machine learning techniques, is an important educational application of natural language processing (NLP). An essay is a discourse where sentences and paragraphs are logically connected to each other to provide comprehensive meaning. Two types of connections have been discussed in the literature: coherence and cohesion [10]. Coherence refers to the semantic relatedness among sentences and logical order of concepts and meanings in a text. She was going home.” is coherent whereas “I saw Jill on the street Two types of coherence are well known in the literature: local coherence and global coherence.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call