Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Hao Wang,Bin Wang,Jiajun Zhang,Jianyong Duan

doi:10.1145/3426882

Abstract

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Unlike English, where every single word is directly typed by keyboard, we have to use an input method to input Chinese characters. The pinyin input method is the most widely used. By intuition, pinyin should be helpful in detecting spelling errors. However, when detect spelling errors, most of the current methods ignore the pinyin information and adopt a pipeline framework that leads to error propagation. In this article, we propose a fusion lattice-LSTM model under the end-to-end framework to integrate character, word, and pinyin features for error detection. Experiments on the SIGHAN Bake-off-2015 dataset show that pinyin is a discriminating feature, and our end-to-end model outperforms the baseline models obviously.

Full Text