Two Stream Deep Neural Network for Sequence-Based Urdu Ligature Recognition

Syed Yasser Arafat,Muhammad Javed Iqbal

doi:10.1109/access.2019.2950537

Syed Yasser Arafat, Muhammad Javed Iqbal

Open Access

https://doi.org/10.1109/access.2019.2950537

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 10	License type: CC BY 4.0

Affiliation: University of Engineering and Technology Taxila

Abstract

Urdu text is a complex cursive script and poses a challenge for recognition by OCR systems due to its large number of ligatures and cursive style. In literature, several techniques have been proposed to recognize Urdu ligatures. However, we have investigated that, suitable challenging datasets and the consequently higher recognition rate is needed for ligature recognition. In this paper, a hybrid model based on the holistic approach is adopted for the recognition of Urdu ligatures (compound characters). More than 3800 unique ligatures were used to generate 46K (38K training, 7K testing) synthetic ligatures with 9 different kinds of transformations along with the normal ligatures. Each ligature is processed through two streams of Deep Neural Networks, namely Alexnet and Vgg16 to obtain a unique set of features corresponding to each net. These features are fused and then used as an input to double layer Bidirectional Long Short Term (BLSTM) network for learning a model. The learned model maps ligature images to their corresponding sequence of individual Urdu characters. In the proposed methodology output is in the editable Urdu-script format. The proposed model was evaluated and have shown an accuracy of 97% on the training dataset and 80% on more than 7K parametrically different query ligatures (test-set).

Highlights

Urdu is the national language of Pakistan and 6-Indian states [1], covering more than 260 million people
Two kinds of synthetic images were generated from the ligatures text of CLE dataset, one for training and another for testing
We have performed t-SNE visualization to understand the complexity of the dataset

Summary

Introduction

Urdu is the national language of Pakistan and 6-Indian states [1], covering more than 260 million people. Script recognition is an essential part of any simple/Photo OCR system. OCRs are generally categorized into two categories: offline systems [1]–[3] and online systems [4], [5]. Offline means at a later stage: essentially recognizing text from printed or photo text, while online means the text is recognized as soon as it is written usually on tablets/smartphones. An OCR system for the Urdu language has different writing styles for Urdu script/text, multiple size ligatures, and image degradations. Along with these variations, the presence of diacritics in Urdu script results in low recognition rates [6], [7]. Urdu has two main commonly used writing styles i.e., Naskh and Nastalique [8] besides others

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Two Stream Deep Neural Network for Sequence-Based Urdu Ligature Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Dataset of Urduud1k from Natural Scenes
U Zaki ... M.A Zaki
SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES | VOL. 51
U Zaki, et. al.U Zaki ... M.A Zaki
10 Dec 2019
SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES | VOL. 51

Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning
Syed Yasser Arafat ... Muhammad Javed Iqbal
IEEE Access | VOL. 8
Syed Yasser Arafat, et. al.Syed Yasser Arafat ... Muhammad Javed Iqbal
01 Jan 2020
IEEE Access | VOL. 8

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Asghar Ali Chandio ... Mehwish Leghari
Data in Brief | VOL. 31
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mehwish Leghari
21 May 2020
Data in Brief | VOL. 31

A Sentimental Analysis of Legal Documents using Deep Learning Approach
Shunmuga Lakshmi Priya K ... Kalaiselvi S
-
Shunmuga Lakshmi Priya K, et. al.Shunmuga Lakshmi Priya K ... Kalaiselvi S
13 Dec 2022
13 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two Stream Deep Neural Network for Sequence-Based Urdu Ligature Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access