Nastalique segmentation-based approach for Urdu OCR

Sarmad Hussain,Salman Ali,Qurat Ul Ain Akram

doi:10.1007/s10032-015-0250-2

Abstract

Much work on Arabic language optical character recognition (OCR) has been on Naskh writing style. Nastalique style, used for most of languages using Arabic script across Southern Asia, is much more challenging to process due to its compactness, cursiveness, higher context sensitivity and diagonality. This makes the Nastalique writing more complex with multiple letters horizontally overlapping each other. Due to these reasons, existing methods used for Naskh would not work for Nastalique and therefore most work on Nastalique has used non-segmentation methods. The current paper presents new approach for segmentation-based analysis for Nastalique style. The paper explains the complexity of Nastalique, why Naskh based techniques cannot work for Nastalique, and proposes a segmentation-based method for developing Nastalique OCR, deriving principles and techniques for the pre-processing and recognition. The OCR is developed for Urdu language. The system is optimized using 79,093 instances of 5249 main bodies derived from a corpus of 18 million words, giving recognition accuracy of 97.11 %. The system is then tested on document images of books with 87.44 % main body recognition accuracy. The work is extensible to other languages using Nastalique.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Nastalique segmentation-based approach for Urdu OCR

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)

Lead the way for us

Journal: International Journal on Document Analysis and Recognition (IJDAR)	Publication Date: Aug 13, 2015
Citations: 50

Similar Papers

Optical Character Recognition Development Using Python
Prakhar Sisodia
Journal of Informatics Electrical and Electronics Engineering (JIEEE) | VOL. 4
Prakhar SisodiaPrakhar Sisodia
01 Jan 2023
Journal of Informatics Electrical and Electronics Engineering (JIEEE) | VOL. 4

Egyptian car plate recognition based on YOLOv8, Easy-OCR, and CNN
Amany Sarhan ... Mohamed Ramadan
Journal of Electrical Systems and Information Technology | VOL. 11
Amany Sarhan, et. al.Amany Sarhan ... Mohamed Ramadan
12 Aug 2024
Journal of Electrical Systems and Information Technology | VOL. 11

Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study
Nhat Truong Pham ... Duc Ngoc Minh Dang
-
Nhat Truong Pham, et. al.Nhat Truong Pham ... Duc Ngoc Minh Dang
29 Jul 2022
29 Jul 2022

A Robust OCR for Degraded Documents
Kapil Dev Dhingra ... Pramod Kumar Sharma
-
Kapil Dev Dhingra, et. al.Kapil Dev Dhingra ... Pramod Kumar Sharma
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nastalique segmentation-based approach for Urdu OCR

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)