Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network

Asghar Ali Chandio,Mehwish Leghari,Md Asikuzzaman,Mark R Pickering

doi:10.1109/access.2022.3144844

Abstract

Text recognition in natural scene images is a challenging problem in computer vision. Different than the optical character recognition (OCR), text recognition in natural scene images is more complex due to variations in text size, colors, fonts, orientations, complex backgrounds, occlusion, illuminations and uneven lighting conditions. In this paper, we propose a segmentation-free method based on a deep convolutional recurrent neural network to solve the problem of cursive text recognition, particularly focusing on Urdu text in natural scenes. Compared to the non-cursive scripts, Urdu text recognition is more complex due to variations in the writing styles, several shapes of the same character, connected text, ligature overlapping, stretched, diagonal and condensed text. The proposed model gets a whole word image as an input without pre-segmenting into individual characters, and then transforms into the sequence of the relevant features. Our model is based on three components: a deep convolutional neural network (CNN) with shortcut connections to extract and encode the features, a recurrent neural network (RNN) to decode the convolutional features, and a connectionist temporal classification (CTC) to map the predicted sequences into the target labels. To increase the text recognition accuracy further, we explore deeper CNN architectures like VGG-16, VGG-19, ResNet-18 and ResNet-34 to extract more appropriate Urdu text features, and compare the recognition results. To conduct the experiments, a new large-scale benchmark dataset of cropped Urdu word images in natural scenes is developed. The experimental results show that the proposed deep CRNN network with shortcut connections outperform than other network architectures. The dataset is publicly available and can be downloaded from <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://data.mendeley.com/datasets/k5fz57zd9z/1</uri> .

Highlights

Text in natural scene images contains rich and valuable information that has great importance with several real-world applications, such as automatic license plate recognition, content-based image or video retrieval, geo-location, assisting visually impaired people, robot navigation, street and road signs recognition and helps in image understanding [1]–[3]
We propose a segmentation-free deep convolutional recurrent neural network (CRNN) to recognise the cropped Urdu word image text in natural scene images
The framework is based on three components: (1) the convolutional neural network (CNN) component for feature extraction, (2) the recurrent neural network (RNN) component to decode the feature sequences into per-frame predictions and (3) the transcription component to map the per-frame predictions into the target labels

Summary

INTRODUCTION

Text in natural scene images contains rich and valuable information that has great importance with several real-world applications, such as automatic license plate recognition, content-based image or video retrieval, geo-location, assisting visually impaired people, robot navigation, street and road signs recognition and helps in image understanding [1]–[3]. Significant work has been performed for the handwritten, printed or artificial text in Arabic or Urdu scripts, the recognition of Arabic and Urdu text in natural scene images has not demonstrated significant results yet [14], [15]. We propose a segmentation-free deep CRNN to recognise the cropped Urdu word image text in natural scene images. The main contributions of this paper are summarised as follows: 1) Several deep structures of the CNN including VGG-16, VGG-19, ResNet-18 and ResNet-34 are explored and modified for the challenging problem of cursive text recognition in natural scene images.

RELATED WORK

SEQUENCE LABELLING

TEXT TRANSCRIPTION

EXPERIMENTAL SETUP AND RESULTS

SELECTING NUMBER OF RNN HIDDEN UNITS IN BLSTM

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 25	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Asghar Ali Chandio ... Mehwish Leghari
Data in Brief | VOL. 31
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mehwish Leghari
21 May 2020
Data in Brief | VOL. 31

Urdu Natural Scene Character Recognition using Convolutional Neural Networks
Asghar Ali ... Kamran Shafi
-
Asghar Ali, et. al.Asghar Ali ... Kamran Shafi
01 Mar 2018
01 Mar 2018

Convolutional Feature Fusion for Multi-Language Text Detection in Natural Scene Images
Asghar Ali Chandio ... Mark Pickering
-
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mark Pickering
01 Jan 2019
01 Jan 2019

Character classification and recognition for Urdu texts in natural scene images
Asghar Ali Chandio ... Mark Pickering
-
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mark Pickering
01 Mar 2018
01 Mar 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access