Abstract
Urdu is a complex language as it is an amalgam of many South Asian and East Asian languages; hence, its character recognition is a huge and difficult task. It is a bidirectional language with its numerals written from left to right while script is written in opposite direction which induces complexities in the recognition process. This paper presents the recognition and classification of a novel Urdu numeral dataset using convolutional neural network (CNN) and its variants. We propose custom CNN model to extract features which are used by Softmax activation function and support vector machine (SVM) classifier. We compare it with GoogLeNet and the residual network (ResNet) in terms of performance. Our proposed CNN gives an accuracy of 98.41% with the Softmax classifier and 99.0% with the SVM classifier. For GoogLeNet, we achieve an accuracy of 95.61% and 96.4% on ResNet. Moreover, we develop datasets for handwritten Urdu numbers and numbers of Pakistani currency to incorporate real-life problems. Our models achieve best accuracies as compared to previous models in the literature for optical character recognition (OCR).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.