Urdu Optical Character Recognition Systems: Present Contributions and Future Directions

Naila Habib Khan,Awais Adnan

doi:10.1109/access.2018.2865532

Abstract

This paper gives an across-the-board comprehensive review and survey of the most prominent studies in the field of Urdu optical character recognition (OCR). This paper introduces the OCR technology and presents a historical review of the OCR systems, providing comparisons between the English, Arabic, and Urdu systems. Detailed background and literature have also been provided for Urdu script, discussing the script’s past, OCR categories, and phases. This paper further reports all state-of-the-art studies for different phases, namely, image acquisition, pre-processing, segmentation, feature extraction, classification/recognition, and post-processing for an Urdu OCR system. In the segmentation section, the analytical and holistic approaches for Urdu text have been emphasized. In the feature extraction section, a comparison has been provided between the feature learning and feature engineering approaches. Deep learning and traditional machine learning approaches have been discussed. The Urdu numeral recognition systems have also been deliberated concisely. The research paper concludes by identifying some open problems and suggesting some future directions.

Full Text