Abstract

A Chinese grammatical error correction (CGEC) method based on the BERT-BiLSTM-CRF model is proposed to address CGEC problems. Firstly, the BERT pre-training model is implemented to generate a deep bi-directional linguistic representation vector that incorporates contextual information. Secondly, the effective text information is mined by using BiLSTM to establish text dependencies. Finally, considering that the text correction process includes substitution, insertion, deletion, and permutation operations, the conditional random field (CRF) algorithm is added to constrain the output variables so they conform to certain grammatical rules in order to obtain the global optimal text sequence. In addition, considering that most characters in the input text do not need modifications and are easily predicted, the focus should be on the wrong position. Therefore, focal loss is introduced to solve this problem. Experiments on the SIGHAN15 and HybirdSet datasets indicate that our approach demonstrates its superiority in error detection and correction tasks regarding their sentence level accuracy, precision, recall, and F1 value metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call