Abstract

Grammatical error correction (GEC) has been successful with deep and complex neural machine translation models, but the annotated data to train the model are scarce. We propose a novel self-feeding training method that generates incorrect sentences from freely available correct sentences. The proposed training method can generate appropriate wrong sentences from unlabeled sentences, using a data generation model trained as an autoencoder. It can also add artificial noise to correct sentences to automatically generate incorrect sentences. We show that the GEC models trained with the self-feeding training method are successful without extra annotated data or deeper neural network-based models, achieving F0.5 score of 0.5982 on the CoNLL-2014 Shared Task test data with a transformer model. The results also show that fully unlabeled training is possible for data-scarce domains and languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call