Experimental evaluation of Arabic OCR systems

Mansoor Alghamdi,William Teahan

doi:10.1108/prr-05-2017-0026

Mansoor Alghamdi, William Teahan

Open Access

https://doi.org/10.1108/prr-05-2017-0026

Copy DOI

Journal: PSU Research Review	Publication Date: Nov 28, 2017
Citations: 17	License type: cc-by

Affiliation: University of Tabuk, Bangor University

Abstract

Purpose The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches. Design/methodology/approach This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes. Findings The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script. Originality/value To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Highlights

Optical character recognition (OCR) is a technique that aims to automatically convert a machine-printed or handwritten text image into an editable text format (Alghamdi et al, 2016)
Our evaluation study is limited to the four most well-known Arabic OCR systems, namely, Automatic Reader 11.2 produced by the Sakhr Software Company; FineReader 12 produced by the ABBYY Company; Clever Page produced by RDI (Research & Development International) and Tesseract produced originally by Hewlett-Packard (HP)
Experimental results and discussion The experimental results, obtained from the evaluation experiment discussed in the previous section, are presented to analyse the effectiveness of the evaluated Arabic OCR systems in printed Arabic text recognition

Summary

Introduction

Optical character recognition (OCR) is a technique that aims to automatically convert a machine-printed or handwritten text image into an editable text format (Alghamdi et al, 2016). This technique is highly desirable in various real-world applications, such as digitising learning resources to assist visually impaired people, bank cheque processing and mail sorting (Alginahi, 2013; Al-Badr and Mahmoud, 1995). The process for developing OCR systems involves five stages: pre-processing, segmentation, feature extraction, classification and post-processing. Specific techniques are applied; for more details, see Khorsheed (2002).

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Experimental evaluation of Arabic OCR systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PSU Research Review

Lead the way for us

Similar Papers

High- Performance Printed Arabic Optical Character Recognition System Using ANN Classifier
Basheer Al-Sadawi ... Ahmed Hussain
-
Basheer Al-Sadawi, et. al.Basheer Al-Sadawi ... Ahmed Hussain
01 Sep 2021
01 Sep 2021

A Holistic Technique for an Arabic OCR System
Farhan Nashwan ... Sherif Abdou
Journal of Imaging | VOL. 4
Farhan Nashwan, et. al.Farhan Nashwan ... Sherif Abdou
27 Dec 2017
Journal of Imaging | VOL. 4

Arabic Character Recognition: Progress and Challenges
Pervez Ahmed ... Yousef Al-Ohali
Journal of King Saud University - Computer and Information Sciences | VOL. 12
Pervez Ahmed, et. al.Pervez Ahmed ... Yousef Al-Ohali
01 Jan 1999
Journal of King Saud University - Computer and Information Sciences | VOL. 12

An Efficient Language-Independent Multi-Font OCR for Arabic Script
Hussein Osman ... Seifeldin Elsehely
-
Hussein Osman, et. al.Hussein Osman ... Seifeldin Elsehely
28 Nov 2020
28 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experimental evaluation of Arabic OCR systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PSU Research Review