Abstract
Text recognition in natural scene images is a challenging problem in computer vision. Different than the optical character recognition (OCR), text recognition in natural scene images is more complex due to variations in text size, colors, fonts, orientations, complex backgrounds, occlusion, illuminations and uneven lighting conditions. In this paper, we propose a segmentation-free method based on a deep convolutional recurrent neural network to solve the problem of cursive text recognition, particularly focusing on Urdu text in natural scenes. Compared to the non-cursive scripts, Urdu text recognition is more complex due to variations in the writing styles, several shapes of the same character, connected text, ligature overlapping, stretched, diagonal and condensed text. The proposed model gets a whole word image as an input without pre-segmenting into individual characters, and then transforms into the sequence of the relevant features. Our model is based on three components: a deep convolutional neural network (CNN) with shortcut connections to extract and encode the features, a recurrent neural network (RNN) to decode the convolutional features, and a connectionist temporal classification (CTC) to map the predicted sequences into the target labels. To increase the text recognition accuracy further, we explore deeper CNN architectures like VGG-16, VGG-19, ResNet-18 and ResNet-34 to extract more appropriate Urdu text features, and compare the recognition results. To conduct the experiments, a new large-scale benchmark dataset of cropped Urdu word images in natural scenes is developed. The experimental results show that the proposed deep CRNN network with shortcut connections outperform than other network architectures. The dataset is publicly available and can be downloaded from <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://data.mendeley.com/datasets/k5fz57zd9z/1</uri> .
Highlights
Text in natural scene images contains rich and valuable information that has great importance with several real-world applications, such as automatic license plate recognition, content-based image or video retrieval, geo-location, assisting visually impaired people, robot navigation, street and road signs recognition and helps in image understanding [1]–[3]
We propose a segmentation-free deep convolutional recurrent neural network (CRNN) to recognise the cropped Urdu word image text in natural scene images
The framework is based on three components: (1) the convolutional neural network (CNN) component for feature extraction, (2) the recurrent neural network (RNN) component to decode the feature sequences into per-frame predictions and (3) the transcription component to map the per-frame predictions into the target labels
Summary
Text in natural scene images contains rich and valuable information that has great importance with several real-world applications, such as automatic license plate recognition, content-based image or video retrieval, geo-location, assisting visually impaired people, robot navigation, street and road signs recognition and helps in image understanding [1]–[3]. Significant work has been performed for the handwritten, printed or artificial text in Arabic or Urdu scripts, the recognition of Arabic and Urdu text in natural scene images has not demonstrated significant results yet [14], [15]. We propose a segmentation-free deep CRNN to recognise the cropped Urdu word image text in natural scene images. The main contributions of this paper are summarised as follows: 1) Several deep structures of the CNN including VGG-16, VGG-19, ResNet-18 and ResNet-34 are explored and modified for the challenging problem of cursive text recognition in natural scene images.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.