Abstract

There has been an increased interest in data generation approaches to grammatical error correction (GEC) using pseudo data. However, these approaches suffer from several issues that make them inconvenient for real-world deployment including a demand for large amounts of training data. On the other hand, some errors based on grammatical rules may not necessarily require a large amount of data if GEC models can realize grammatical generalization. This study explores to what extent GEC models generalize grammatical knowledge required for correcting errors. We introduce an analysis method using synthetic and real GEC datasets with controlled vocabularies to evaluate whether models can generalize to unseen errors. We found that a current standard Transformer-based GEC model fails to realize grammatical generalization even in simple settings with limited vocabulary and syntax, suggesting that it lacks the generalization ability required to correct errors from provided training examples.

Highlights

  • Grammatical Error Correction (GEC) is the task of automatically correcting grammatical errors in a text

  • We investigate standard five error types defined by Bryant et al (2017), which are errors based on grammatical rules: subject-verb agreement errors (VERB:SVA), verb forms errors (VERB:FORM), word order errors (WO), morphological errors (MORPH), and noun number errors (NOUN:NUM)

  • This study explored to what extent GEC models generalize grammatical knowledge required for correcting errors

Read more

Summary

Introduction

Grammatical Error Correction (GEC) is the task of automatically correcting grammatical errors in a text. Test 2: (Unknown Setting) Every polite cow *smile / smiles awkwardly amount of training data using pseudo data without making any modifications to the model architecture (Grundkiewicz et al, 2019; Kiyono et al, 2019). These approaches suffer from several issues that make them inconvenient for real-world deployment, including a demand for large amounts of training data. If GEC models can realize grammatical generalization, as humans do not need to memorize individual error correction patterns (target terms and its corrections) as long as they have learned grammatical rules, some errors based on grammatical rules (e.g., subject-verb agreement errors) do not necessarily require large amounts of data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.