Abstract

In the emerging age of technologies, machines are becoming more and more skilled and capable just like humans. Despite the fact that machines do not have their own intelligence, but still due to advancement in Artificial Intelligence (AI), machines are rapidly advancing. The area of Pattern Recognition (PR) deals with bringing enhancements to identify obscure patterns corresponding to specific classes. Optical Character Recognition (OCR) is a subfield of PR which deals with the recognition of characters. A great work has been done for Japanese, Hindi, Arabic and Chinese scripts, but only a diminutive work has been done for Urdu script. The Urdu language is highly cursive and is written in different calligraphic styles like Naskh, Nastalique, Kofi, Devani and Riqa. The Nastalique font is very calligraphic with aesthetic beauty. The ligature segmentation of Urdu Nastalique is also more difficult as compared to other languages. Urdu Nastalique has some characteristics like stacking of ligatures and cursiveness which makes its ligature segmentation a difficult task. Cursiveness means ligatures are joined together to form a new shape. It contains connected ligatures which makes it more complicated as compared to other languages. The ligature recognition of Urdu text by an OCR is a strenuous task due to variants of scaling, rotation, orientation and font style. In this study, a scale and rotation invariant classifier for Urdu Nastalique OCR is proposed. A combination of scale and location invariant moments is used for feature extraction and the classification is performed using Cascade Forward Backpropagation Neural Network. The model is validated through independent dataset testing and 5-fold cross-validation which gave 96.474% and 96.922% accuracy. The results depict the adaptability of the proposed model due to its high accuracy for recognition of Urdu Nastalique Ligature.

Highlights

  • Urdu is the official dialect of Pakistan

  • Different experiments were carried out to find the accuracy of the proposed model which was based on scale and rotation invariant classifier for optical character recognition of the

  • The same Center of Language Engineering (CLE) corpus was used in order to evaluate the accuracy of the proposed model

Read more

Summary

Introduction

Urdu is the official dialect of Pakistan. This language is spoken in many different nations around the globe. The word ‘‘Urdu’’ is originated from the Turkish word ‘‘order’’ meaning camp or army [1]. The Urdu language is a combination of different languages. Urdu and Hindi share the same background. In Bangladesh, Urdu is used as a medium of communication and is referred to as Behari. Urdu has the influence of Arabic, Persian and Turkish languages. It includes words of many other languages which enhance its appeal as poetic language

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.