Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios

Zeinab Mahmoud,Abdelzahir Abdelmaboud,Marco Zappatore,Chunlin Li,Ali Alfatemi,Ashraf Osman Ibrahim,Aiman Solyman

doi:10.7717/peerj-cs.1639

Abstract

The correction of grammatical errors in natural language processing is a crucial task as it aims to enhance the accuracy and intelligibility of written language. However, developing a grammatical error correction (GEC) framework for low-resource languages presents significant challenges due to the lack of available training data. This article proposes a novel GEC framework for low-resource languages, using Arabic as a case study. To generate more training data, we propose a semi-supervised confusion method called the equal distribution of synthetic errors (EDSE), which generates a wide range of parallel training data. Additionally, this article addresses two limitations of the classical seq2seq GEC model, which are unbalanced outputs due to the unidirectional decoder and exposure bias during inference. To overcome these limitations, we apply a knowledge distillation technique from neural machine translation. This method utilizes two decoders, a forward decoder right-to-left and a backward decoder left-to-right, and measures their agreement using Kullback-Leibler divergence as a regularization term. The experimental results on two benchmarks demonstrate that our proposed framework outperforms the Transformer baseline and two widely used bidirectional decoding techniques, namely asynchronous and synchronous bidirectional decoding. Furthermore, the proposed framework reported the highest F1 score, and generating synthetic data using the equal distribution technique for syntactic errors resulted in a significant improvement in performance. These findings demonstrate the effectiveness of the proposed framework for improving grammatical error correction for low-resource languages, particularly for the Arabic language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Journal: PeerJ Computer Science	Publication Date: Oct 24, 2023
License type: CC BY 4.0

Similar Papers

A Comprehensive Survey of Grammatical Error Correction
Yu Wang ... Yuelin Wang
ACM Transactions on Intelligent Systems and Technology | VOL. 12
Yu Wang, et. al.Yu Wang ... Yuelin Wang
31 Oct 2021
ACM Transactions on Intelligent Systems and Technology | VOL. 12

Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students
Binbin Chen ... Jingyu Zhang
IEEE Access | VOL. 10
Binbin Chen, et. al.Binbin Chen ... Jingyu Zhang
01 Jan 2021
IEEE Access | VOL. 10

Optimizing the impact of data augmentation for low-resource grammatical error correction
Aiman Solyman ... Lubna Abdelkareim Gabralla
Journal of King Saud University - Computer and Information Sciences | VOL. 35
Aiman Solyman, et. al.Aiman Solyman ... Lubna Abdelkareim Gabralla
09 May 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 35

MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking
Zewei Zhao ... Houfeng Wang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Zewei Zhao, et. al.Zewei Zhao ... Houfeng Wang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science