Abstract

This study proposes a new algorithm that constructs a word Bayesian network (BN) framework with grapheme nodes to recognize off-line handwritten Uyghur words. First, we build an Uyghur script grapheme library according to the rules and morphological structure of Uyghur. The library includes main grapheme, affix grapheme, and dot grapheme categories. Second, word images are segmented into grapheme sequences by subjecting the individual strokes to extraction, segmentation, and clustering operations. Then we design specific feature extractors and classifiers for specific graphemes to detect and identify small differences between similar words. Finally, we construct a hierarchical matching model for graphemes, conjoined segments, and words using a discrete BN. The BN infers word categories from grapheme features, calculates the confidence of inference, and integrates the grapheme recognition information and word-formation prior information to obtain the final word recognition results. A word recognition rate of 91.65% is obtained during experiments conducted with a database consisting of 12,500 samples and a total of 58 trained grapheme categories. These results indicate that the proposed algorithm not only provides a high word recognition rate by effectively avoiding character over-segmentation errors, but also employs a small and fully predictable number of training categories, which facilitates strong expansibility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call