A segmentation-free approach to Arabic and Urdu OCR

Nazly Sabbour,Faisal Shafait

doi:10.1117/12.2003731

Abstract

In this paper, we present a generic Optical Character Recognition system for Arabic script languages called Nabocr. Nabocr uses OCR approaches specific for Arabic script recognition. Performing recognition on Arabic script text is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive and context sensitive. Moreover, Arabic script has different writing styles that vary in complexity. Nabocr is initially trained to recognize both Urdu Nastaleeq and Arabic Naskh fonts. However, it can be trained by users to be used for other Arabic script languages. We have evaluated our system's performance for both Urdu and Arabic. In order to evaluate Urdu recognition, we have generated a dataset of Urdu text called UPTI (Urdu Printed Text Image Database), which measures different aspects of a recognition system. The performance of our system for Urdu clean text is 91%. For Arabic clean text, the performance is 86%. Moreover, we have compared the performance of our system against Tesseract's newly released Arabic recognition, and the performance of both systems on clean images is almost the same.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A segmentation-free approach to Arabic and Urdu OCR

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The Allographic Use of Hebrew and Arabic in the Samaritan Manuscript Culture
Stefan Schorch
Intellectual History of the Islamicate World | VOL. 8
Stefan SchorchStefan Schorch
20 Jan 2020
Intellectual History of the Islamicate World | VOL. 8

Teknik Gores Aksara Arab pada Naskah Berbahan Lontar di Pesantren Suryalaya

-

19 Jun 2014
19 Jun 2014

The Role of the Karaites in the Transmission of the Hebrew Bible and Their Practice of Transcribing It into Arabic Script
Geoffrey Khan
Intellectual History of the Islamicate World | VOL. 8
Geoffrey KhanGeoffrey Khan
30 Jul 2020
Intellectual History of the Islamicate World | VOL. 8

Arabic Scene Text Recognition in the Deep Learning Era: Analysis on a Novel Dataset
Heba Hassan ... Mohamed E Hussein
IEEE Access | VOL. 9
Heba Hassan, et. al.Heba Hassan ... Mohamed E Hussein
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A segmentation-free approach to Arabic and Urdu OCR

Abstract

Talk to us

Similar Papers