Abstract

Automatic Chinese text checking and error correction is an important and difficult problem. Compared with automatic checking and error correction of Western text automatic checking and error correction of Chinese text faces more challenges. The Chinese language has many characters and no delimiters separating words. It is impossible to detect. and correct errors by penetrating into the inner composition of a character. In this paper, we describe some special features of Chinese characters and text and some statistical information obtained from a real world Chinese text corpus, and we present a hybrid approach that combines a rule-based method and a probability-based method to automatic checking and error correction of Chinese text. We also present an experimental system, HSACCCT (Hybrid System of Automatic Checking and Correction for Chinese Text), that implements this hybrid approach and some experimental results on real world Chinese text.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call