Research on Chinese Text Error Correction Based on Sequence Model

Jianyong Duan,Yang Yuan,Zheng Tan,Xiaopeng Wei,Hao Wang

doi:10.1109/ialp48816.2019.9037666

Abstract

When users input text, it will inevitably produce errors, and with the rapid development and popularization of smart devices, the situation becomes more and more serious. Therefore, text correction has become one of the important research directions in the field of natural language processing. As the grammatical error correction task, in this paper, the error correction process of Chinese text is regarded as the conversion process from wrong sentence to correct sentence. In order to adapt to this task, the (sequence-to-sequence) Seq2Seq model is introduced. The wrong sentence is used as the source sentence, and the correct sentence is used as the target sentence. Supervised training is carried out in units of characters and words. It can be used for correcting errors such as word of homophone, homotype, and near-sound, greatly reducing the artificial participation and expert support of feature extraction, improve model accuracy on specific errors. In order to solve the information loss caused by the conversion of long sequence to fixed length vector, the attention mechanism is introduced into the basic model. After adding the attention mechanism, the model’s accuracy, recall rate and F1 value have been effectively improved.

Full Text