Abstract

Telugu is a Dravidian Language spoken mainly in Southern parts of India. It has close to 81 million native speakers, making it the fifteenth most widely-spoken language in the world. Here we present a comprehensive database of handwritten Telugu characters to drive progress in handwriting recognition for this script. We claim that this is significant since we have put together the largest set of vowel, consonant, vowel-consonant and consonant-consonant pairs of the Telugu orthography. This work produces such a database with real-world offline handwritten characters extracted from scanned documents, making it the largest and most varied database in this domain. The method of collecting data, preprocessing steps, as well as the extraction approach to obtain individual Telugu characters is explained in detail. The dataset is also made open to use as a test set to evaluate handwriting recognition approaches and other related tasks. This work also presents a method of handwritten Telugu character recognition using Convolutional Neural Networks as a baseline classifier, as well as Visual Attention Networks as a more advanced and effective solution. Finally, the proposed architecture is compared with previous solutions and the results are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.