Abstract

Chinese Character Recognition(CCR) is a critical application of Optical Character Recognition(OCR), a vital area of pattern recognition. Research on CCR in the past decades mainly focused on the modern Chinese characters, but not on the ancient ones. Compared to modern Chinese characters, ancient characters are more diverse and multiple ancient characters can correspond to one modern character. When doing recognition, the unique features of ancient Chinese characters cause a significant amount of time on manual labeling. This paper proposes an automatic labeling algorithm based on a semi-supervised dictionary training neural network that drastically decreases human effort. We first created an offline training set as a dictionary including 8,226 Chinese characters from ancient documents in modern fonts. And put the set into the network. Then we recursively retrained the network on an unlabeled data set of about 1.3 million characters images segmented from ancient documents resulting in a very high accuracy rate of 98.96 %. This work is one part of our wide recognition of ancient documents with handwritten Chinese characters project.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call