Abstract

Historical Chinese character recognition has been suffering from the problem of samples labeling, not only the problem of lacking sufficient labeled training samples, but also of sample classes. So the scenario for Historical Chinese character recognition is "open set" recognition, where incomplete labeling of sample classes is present at training time, and unknown classes can be submitted to the system during testing. This paper proposes a method for open set Historical Chinese Character Recognition. For open set recognition, the features available in the training data cannot effectively characterize different kinds of unknown classes. We assume that the features which characterize unknown classes can be derived or learned from other similar data sets. We utilize an auxiliary data set combined with the open set training data set to learn good features to represent historical Chinese characters. The auxiliary data set is translated using Generative Adversarial Networks (GAN) to make sure that the translated data set is as close to the historical Chinese character dataset as possible. Then we construct a neural network for features extraction. The neural network is trained using an alternative training method with the translated auxiliary dataset and incomplete labeled historical Chinese character data set. Last, features are extracted from certain layer of the trained neural network. Unknown samples are detected using statistical modelling of the Euclidean metric between samples. Experimental results show that the proposed method is effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call