A Two-Stage Model for Chinese Grammatical Error Correction

Zhaoquan Qiu,Youli Qu

doi:10.1109/access.2019.2940607

Abstract

Chinese grammatical error correction (GEC) is more challenging than English GEC due to its language characteristics. In this paper, a two-stage model was proposed to solve the Chinese GEC problem. The model consists of two components: a spelling check model and a GEC model. The spelling check model based on language model focuses on correcting spelling errors, while the GEC model based on neural sequence-to-sequence (seq2seq) model focuses on correcting grammatical errors. In addition, two generative methods allow the seq2seq model to correct an erroneous sentence incrementally with repeated inference steps. Furthermore, only one seq2seq model is used for grammatical correction rather than ensemble multiple models, which greatly speeds up the generation of final results and saves computing resources. The two-stage model achieves 31.01 F 0.5 on NLPCC 2018 test set, significantly outperforms all prior approaches on this task.

Highlights

Grammatical error correction (GEC) is an important task in natural language processing (NLP), which aims to detect and correct errors in text
SYSTEM DESCRIPTION This paper presents a two-stage model for Chinese GEC
Fu et al [12]: the state-of-the-art Chinese GEC system on NLPCC 2018 dataset, which is based on spelling error correction model and neural machine translation (NMT) model

Summary

INTRODUCTION

Grammatical error correction (GEC) is an important task in natural language processing (NLP), which aims to detect and correct errors in text. Previous work has mainly focused on the diagnosis of grammatical error [10], [11] rather than correction. The NLPCC 2018 shared task provides NLP researchers an opportunity to study and develop Chinese GEC. To alleviate the data sparsity problem and language difference problem, we decompose the Chinese GEC task into two subtasks: spelling check based language model and grammatical error correction based seq2seq model. The intermediate results are translated into a clean, grammatically correct sentence by seq2seq model. Seq2seq model may not be able to correct all errors in a sentence with multiple grammatical errors by just a single round inference.

RELATED WORK

SPELLING CHECK BASED LANGUAGE MODEL

Output

CONCLUSION