Multilingual fine-tuning for Grammatical Error Correction

Krzysztof Pająk,Dominik Pająk

doi:10.1016/j.eswa.2022.116948

Abstract

Finding a single model capable of comprehending multiple languages is an area of active research in Natural Language Processing (NLP). Recently developed models such as mBART, mT5 or xProphetNet can solve problems connected with, for instance, machine translation and summarization for many languages. However, good multilingual solutions to the problem of Grammatical Error Correction (GEC) are still missing — this paper aims at filling this gap. We first review current annotated GEC datasets and then apply existing pre-trained multilingual models to correct grammatical errors in multiple languages. In our experiments, we compare how different pre-training approaches impact the final GEC quality. Our result is a single model that can correct seven different languages and is the best (in terms of F-score) currently reported multilingual GEC model. Additionally, our multilingual model achieves better results than the SOTA monolingual model for Romanian.

Full Text