Automatic Arabic Grammatical Error Correction based on Expectation-Maximization routing and target-bidirectional agreement

Aiman Solyman,Zhenyu Wang,Qian Tao,Arafat Abdulgader Mohammed Elhag,Rui Zhang,Zeinab Mahmoud

doi:10.1016/j.knosys.2022.108180

Abstract

Automatic Grammar Error Correction (GEC) detects and corrects various types of syntax, spelling, and grammatical errors. Different approaches such as rule-based, Statistical Machine Translation (SMT), and Neural Machine Translation (NMT) have been proposed. Among these approaches, NMT based on seq2seq multi-head attention (Transformer) performs the best. The key shortcoming of GEC seq2seq models with multiple encoder-decoder layers is that only the top layer is exploited in the subsequent processes. In addition, due to the exposure bias problem during inference, some of the previous target words are deleted and replaced by other words generated by the model itself, which leads to unsatisfactory output. This paper proposed GEC model based on seq2seq Transformer for low-resource languages such as Arabic to address these issues. Initially, we proposed a noising method for constructing synthetic parallel data to overcome the bottleneck arising from the lack of corpus. Furthermore, motivated by the success of capsule networks in computer vision, we used the Expectation-Maximization routing algorithm to dynamically aggregate information across layers in Arabic GEC. Moreover, to conquer the exposure bias problem, we introduced a bidirectional regularization term using Kullback-Leibler divergence in the training objective to improve the agreement between Right-to-left and Left-to-right models. Experiments performed on two benchmarks QALB-2014 and QALB-2015 showed that our proposed model achieved the best F1 score compared to the existing Arabic GEC systems.

Full Text