Abstract
The accuracy of current natural scene text recognition algorithms is limited by the poor performance of character recognition methods for these images. The complex backgrounds, variations in the writing, text size, orientations, low resolution and multi-language text make recognition of text in natural images a complex and challenging task. Conventional machine learning and deep learning-based methods have been developed that have achieved satisfactory results, but character recognition for cursive text such as Arabic and Urdu scripts in natural images is still an open research problem. The characters in the cursive text are connected and are difficult to segment for recognition. Variations in the shape of a character due to its different positions within a word make the recognition task more challenging than non-cursive text. Optical character recognition (OCR) techniques proposed for Arabic and Urdu scanned documents perform very poorly when applied to character recognition in natural images. In this paper, we propose a multi-scale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images. The network first aggregates multi-scale features of the convolutional layers by up-sampling and addition operations and then combines them with the high-level features. Finally, the outputs of the MSFA and MLFF networks are fused together to create more robust and powerful features. A comprehensive dataset of segmented Urdu characters is developed for the evaluation of the proposed network models. Synthetic text on the patches of images with real natural scene backgrounds is generated to increase the samples of infrequently used characters. The proposed model is evaluated on the Chars74K and ICDAR03 datasets. To validate the proposed model on the new Urdu character image dataset, we compare its performance with the histogram of oriented gradients (HoG) method. The experimental results show that the aggregation of multi-scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets.
Highlights
Rapid developments in camera-based portable devices have facilitated the acquisition of a large number of images every day
To handle the challenging problem of Urdu text recognition in natural scene images, we propose a new convolutional neural network (CNN) architecture that integrates convolutional features of the network at different layers and combines them with the high-level layers to create a fused feature
1) ENGLISH NATURAL SCENE CHARACTER DATASET To analyze the quality of the proposed method, we evaluated our method on the Chars74K [33] and ICDAR03 [34] datasets, and compared its performance in terms of F-score with a number of state-of-the-art character recognition methods
Summary
Rapid developments in camera-based portable devices have facilitated the acquisition of a large number of images every day. The ICDAR has published a multi-language natural scene image dataset that includes Arabic and eight other languages [14], whereas the datasets, techniques, evaluation protocols and the results achieved for Chinese text detection and end-to-end recognition are reported in [13] In these ICDAR robust reading competitions, the problem of text extraction is generally divided into four sub-tasks: (i) text detection, (ii) isolated character recognition, (iii) cropped word recognition and (iv) end-toend text recognition. In this research study, a new dataset is created which contains images of isolated characters that are manually segmented from natural scene images containing Urdu text Before passing this dataset to the CNN classifier for classification and recognition, preprocessing operations are performed to give the dataset a uniform and standard representation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.