Abstract

Finding a single model capable of comprehending multiple languages is an area of active research in Natural Language Processing (NLP). Recently developed models such as mBART, mT5 or xProphetNet can solve problems connected with, for instance, machine translation and summarization for many languages. However, good multilingual solutions to the problem of Grammatical Error Correction (GEC) are still missing — this paper aims at filling this gap. We first review current annotated GEC datasets and then apply existing pre-trained multilingual models to correct grammatical errors in multiple languages. In our experiments, we compare how different pre-training approaches impact the final GEC quality. Our result is a single model that can correct seven different languages and is the best (in terms of F-score) currently reported multilingual GEC model. Additionally, our multilingual model achieves better results than the SOTA monolingual model for Romanian.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call