Abstract

An approach is proposed for Chinese spelling error detection and correction, in which an inverted index list with a rescoring mechanism is used. The inverted index list is a structure for mapping from word to desired sentence, and for representing nodes in lattices constructed through character expansion (according to predefined phonologically and visually similar character sets). Pruning based on a contextual dependency confidence measure was used to markedly reduce the search space and computational complexity. Relevant mapping relations between the original input and desired input were obtained using a scoring mechanism composed of class-based language and maximum entropy correction models containing character, word, and contextual features. The proposed method was evaluated using data sets provided by SigHan 7 bakeoff. The experimental results show that the proposed method achieved acceptable performance in terms of recall rate or precision rate in error sentence detection and error location detection, and it outperformed other approaches in error location detection and correction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call