Abstract
Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size $320\times 240$ pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN’s evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65,.88, and.87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classification in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images.
Highlights
As autonomous vehicles and other intelligent devices/CellPhones/mobile devices [1]/Robots [2] are coming online and they need to understand the environment in which they are operating
DISCUSSION we presented a novel methodology for Urdu text, covering the entire spectrum of both text detection and recognition as well as orientation prediction
For determining the best Convolutional Neural Networks (CNNs) for Urdu text detection, the first set of 4.2K synthetic dataset images were taken as an input to CNNs. 4-different CNNs are used as feature extractors to train four FasterRCNN models
Summary
As autonomous vehicles and other intelligent devices/CellPhones/mobile devices [1]/Robots [2] are coming online and they need to understand the environment in which they are operating. The availability of outdoor Urdu text datasets facilitates in evaluating different kinds of learning models for judging their effectiveness for text detection, Orientation prediction, and ligature recognition is needed. J. Iqbal: Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning. J. Iqbal: Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning TABLE 1. This paper performs a comprehensive analysis of CNN features for determining their ability and effectiveness for Urdu text detection in outdoor pictures at the ligature level, the orientations of ligatures, and the recognition of individual characters in a text. The first study in Urdu text/script detection literature, which has used four different kinds of CNNs features for detection purposes. TSDNN based recognition of Urdu Ligatures using Resnet, googlenet features and BLSTM for both synthetic and real outdoor images.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.