Abstract

Japanese historical documents provide valuable information. Character recognition is a critical technology for the digitalization of historical documents. Sample imbalance is a significant obstacle in recognizing Japanese historical characters, kuzushiji. Thousands of kuzushiji only have less than a few samples. Thus, recognition performance deteriorates greatly in kuzushiji with a few samples. In this study, we propose a framework for transferring knowledge of character parts from font to kuzushiji. The pretraining learns character parts from synthesized font images. However, fine-tuning to kuzushiji is more complex. We propose calculating a mean squared error loss between feature vectors of kuzushiji and font images, resulting in consistent feature vectors in kuzushiji and font. Consequently, we can perform zero-shot recognition for kuzushiji using the font images of zero-sampled kuzushiji. The experimental results show that the proposed method recognized zero-sampled kuzushiji at approximately 48% accuracy. Consequently, we significantly expand the number of recognizable kuzushiji.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.