Abstract

The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta’liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta’liq printed text, which significantly outperforms the state-of-the-art techniques.

Highlights

  • The tremendous advances in the field of image processing and computational intelligence have resulted in a significant progress in the development of character recognition applications for complex scripts

  • We investigate Multi-dimensional Long Short Term Memory (MDLSTM) using raw pixels for Urdu Nasta’liq recognition

  • Methods we present the experimental design of Urdu Nasta’liq text line recognition

Read more

Summary

Introduction

The tremendous advances in the field of image processing and computational intelligence have resulted in a significant progress in the development of character recognition applications for complex scripts. Several OCR systems have been developed in the commercial as well as open source domain for the recognition of Asian scripts like Chinese, Japanese, and Korean; such as ABBYY FineReader, MeOCR, JOCR3 and Tesseract (Smith 2007). Recognition of its derivative scripts like Nasta’liq is further complicated due to its calligraphic nature (Naz et al 2014a). We point out these complexities to show that the work done for Arabic script recognition is not suitable for Urdu Nasta’liq (cf “Urdu–Nasta’liq script” section) script

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.