Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency

Yujin Takahashi,Mamoru Komachi,Satoru Katsumata

doi:10.18653/v1/2020.acl-srw.5

Abstract

Recently, several studies have focused on improving the performance of grammatical error correction (GEC) tasks using pseudo data. However, a large amount of pseudo data are required to train an accurate GEC model. To address the limitations of language and computational resources, we assume that introducing pseudo errors into sentences similar to those written by the language learners is more efficient, rather than incorporating random pseudo errors into monolingual data. In this regard, we study the effect of pseudo data on GEC task performance using two approaches. First, we extract sentences that are similar to the learners' sentences from monolingual data. Second, we generate realistic pseudo errors by considering error types that learners often make. Based on our comparative results, we observe that F0.5 scores for the Russian GEC task are significantly improved.

Highlights

Several studies have proposed models to solve grammatical error correction (GEC) task as an application of writing support for language learners of various languages, such as English or Russian
We show that the proposed pseudo data generation method improves the F0.5 scores of the GEC model
We show the effect of realistic pseudo errors by considering the types of errors typically made by language learners for the Russian GEC task

Summary

Introduction

Several studies have proposed models to solve grammatical error correction (GEC) task as an application of writing support for language learners of various languages, such as English or Russian. A standard approach to improve GEC models is to incorporate pseudo errors into large monolingual datasets for pretraining. Considering the aforementioned approach, several methods have been proposed for the generation of pseudo data for pre-training a GEC model. In this study, we generate pseudo data to train GEC models considering the types of errors made by language learners and study the effect of this realistic pseudo training data. We show the effect of realistic pseudo errors by considering the types of errors typically made by language learners for the Russian GEC task

Related Works

6.56 Delete

Method for Pseudo Data Generation

Data Selection

Error Types

Experiments

Experimental Setting

Result

Analysis

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 22	License type: cc-by

Similar Papers

Chinese Grammatical Error Correction Using Pre-trained Models and Pseudo Data
Hongfei Wang ... Michiki Kurosawa
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Hongfei Wang, et. al.Hongfei Wang ... Michiki Kurosawa
10 Mar 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Cross-lingual Transfer Learning for Grammatical Error Correction
...
-
, et. al. ...
25 Nov 2020
25 Nov 2020

Cross-lingual Transfer Learning for Grammatical Error Correction
Ikumi Yamashita ... Mamoru Komachi
-
Ikumi Yamashita, et. al.Ikumi Yamashita ... Mamoru Komachi
01 Jan 2020
01 Jan 2020

A Hybrid System for Chinese Grammatical Error Diagnosis and Correction
Chen Li ... Junpei Zhou
-
Chen Li, et. al.Chen Li ... Junpei Zhou
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency

Abstract

Highlights

Summary

Talk to us

Similar Papers