Abstract

This paper presents our system in the Chinese spelling check (CSC) task of SIGHAN-8 Bake-Off. Given a sentence, our systems are designed to detect and correct the spelling error. As we know, CSC is still a hot topic today and it is an open problem yet. N-gram language modeling (LM) is widely used in CSC, since its simplicity and power. We present a model based on joint bi-gram and trigram LM and Chinese word segmentation. Besides, we apply dynamic programming to increase efficiency and employ smoothing technique to address the sparseness of the n-gram in training data. The evaluation results show the utility of our CSC system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call