Abstract
Recently, several studies have focused on improving the performance of grammatical error correction (GEC) tasks using pseudo data. However, a large amount of pseudo data are required to train an accurate GEC model. To address the limitations of language and computational resources, we assume that introducing pseudo errors into sentences similar to those written by the language learners is more efficient, rather than incorporating random pseudo errors into monolingual data. In this regard, we study the effect of pseudo data on GEC task performance using two approaches. First, we extract sentences that are similar to the learners' sentences from monolingual data. Second, we generate realistic pseudo errors by considering error types that learners often make. Based on our comparative results, we observe that F0.5 scores for the Russian GEC task are significantly improved.
Highlights
Several studies have proposed models to solve grammatical error correction (GEC) task as an application of writing support for language learners of various languages, such as English or Russian
We show that the proposed pseudo data generation method improves the F0.5 scores of the GEC model
We show the effect of realistic pseudo errors by considering the types of errors typically made by language learners for the Russian GEC task
Summary
Several studies have proposed models to solve grammatical error correction (GEC) task as an application of writing support for language learners of various languages, such as English or Russian. A standard approach to improve GEC models is to incorporate pseudo errors into large monolingual datasets for pretraining. Considering the aforementioned approach, several methods have been proposed for the generation of pseudo data for pre-training a GEC model. In this study, we generate pseudo data to train GEC models considering the types of errors made by language learners and study the effect of this realistic pseudo training data. We show the effect of realistic pseudo errors by considering the types of errors typically made by language learners for the Russian GEC task
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.