Grammatical Error Correction for Basque through a seq2seq neural architecture and synthetic examples

Zuhaitz Beloki ,Xabier Saralegi ,Klara Ceberio ,Ander Corral

doi:10.26342/2020-65-1

Grammatical Error Correction for Basque through a seq2seq neural architecture and synthetic examples

Zuhaitz Beloki , Xabier Saralegi + Show 2 more

https://doi.org/10.26342/2020-65-1

Copy DOI

Journal: Procesamiento Del Lenguaje Natural

Publication Date: Oct 17, 2020

#Synthetic Examples #Large Training Datasets + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Sequence-to-sequence neural architectures are the state of the art for addressing the task of correcting grammatical errors. However, large training datasets are required for this task. This paper studies the use of sequence-to-sequence neural models for the correction of grammatical errors in Basque. As there is no training data for this language, we have developed a rule-based method to generate grammatically incorrect sentences from a collection of correct sentences extracted from a corpus of 500,000 news in Basque. We have built different training datasets according to different strategies to combine the synthetic examples. From these datasets different models based on the Transformer architecture have been trained and evaluated according to accuracy, recall and F0.5 score. The results obtained with the best model reach 0.87 of F0.5 score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Procesamiento Del Lenguaje Natural

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.