Abstract

Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

Highlights

  • Grammatical error correction (GEC) is the task of automatically editing text to remove grammatical errors; for example: [A link to registration can be found at on the same page.]

  • Overall F0.5 ranges from around 30 to 52 for most datasets; when the models are evaluated on CWEB and AESW, we observe a substantial drop in performance, with the lowest F0.5 score being the PIE system on CWEB-S (6.15)

  • We release a new GEC benchmark, CWEB, consisting of website text generated by English speakers at varying levels of proficiency

Read more

Summary

Introduction

Grammatical error correction (GEC) is the task of automatically editing text to remove grammatical errors; for example: [A link to registration can be found at on the same page.]. GEC systems so far have primarily focused on correcting essays produced by English-as-a-second-language (ESL) learners, providing fast and inexpensive feedback to facilitate language learning. This is only one target domain in the full spectrum of GEC applications. GEC models can help to improve written communication outside of the formal education setting. Today the largest medium of written communication is the internet, with approximately 380 new websites created every minute.. Ensuring grammatical correctness of websites helps facilitate clear communication and a professional commercial presentation.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.