Abstract

This paper presents a novel approach to create synthetic dataset for word recognition systems. Our purpose is to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data. Due to lack of proper data-set for many languages it becomes hard to train recognition systems. To solve such problems synthetic handwriting could be used to expand the existing training dataset. Any available digital data from online newspaper and such sources can be used to generate this synthetic data. The digital data is distorted in such a way that the underlying pattern is conserved for identification of the word by both machine and human user. The images hence produced can be used to train any classification system for handwriting recognition. This data can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the accuracy of the results. We experimented using only synthetic data obtaining high recognition accuracy in both character and word recognition. The data was tested on 3 Indian scripts for numerals- Hindi, Bengali and Telugu, and 1 script-Hindi for words, the results achieved hence are highly promising.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.