Abstract

Abstract Chinese grammatical error correction (CGEC) is a significant challenge in Chinese natural language processing. Deep-learning-based models tend to have tens of millions or even hundreds of millions of parameters since they model the target task as a sequence-to-sequence problem. This may require a vast quantity of annotated corpora for training and parameter tuning. However, there are currently few open-source annotated corpora for the CGEC task; the existing researches mainly concentrate on using data augmentation technology to alleviate the data-hungry problem. In this paper, rather than expanding training data, we propose a competitive CGEC model from a new insight for reducing model parameters. The model contains three main components: a sequence learning module, a grammatical generalization module and a parameter sharing module. Experimental results on two Chinese benchmarks demonstrate that the proposed model could achieve competitive performance over several baselines. Even if the parameter number of our model is reduced by 1/3, it could reach a comparable $F_{0.5}$ value of 30.75%. Furthermore, we utilize English datasets to evaluate the generalization and scalability of the proposed model. This could provide a new feasible research direction for CGEC research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call