Chinese grammatical error correction based on the BERT-BiLSTM-CRF model

Fangfang Gu,Zhuo Wang

doi:10.1117/12.2675154

Abstract

A Chinese grammatical error correction (CGEC) method based on the BERT-BiLSTM-CRF model is proposed to address CGEC problems. Firstly, the BERT pre-training model is implemented to generate a deep bi-directional linguistic representation vector that incorporates contextual information. Secondly, the effective text information is mined by using BiLSTM to establish text dependencies. Finally, considering that the text correction process includes substitution, insertion, deletion, and permutation operations, the conditional random field (CRF) algorithm is added to constrain the output variables so they conform to certain grammatical rules in order to obtain the global optimal text sequence. In addition, considering that most characters in the input text do not need modifications and are easily predicted, the focus should be on the wrong position. Therefore, focal loss is introduced to solve this problem. Experiments on the SIGHAN15 and HybirdSet datasets indicate that our approach demonstrates its superiority in error detection and correction tasks regarding their sentence level accuracy, precision, recall, and F1 value metrics.

Full Text