Abstract

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Unlike English, where every single word is directly typed by keyboard, we have to use an input method to input Chinese characters. The pinyin input method is the most widely used. By intuition, pinyin should be helpful in detecting spelling errors. However, when detect spelling errors, most of the current methods ignore the pinyin information and adopt a pipeline framework that leads to error propagation. In this article, we propose a fusion lattice-LSTM model under the end-to-end framework to integrate character, word, and pinyin features for error detection. Experiments on the SIGHAN Bake-off-2015 dataset show that pinyin is a discriminating feature, and our end-to-end model outperforms the baseline models obviously.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call