Handwritten Arabic Optical Character Recognition Approach Based on Hybrid Whale Optimization Algorithm With Neighborhood Rough Set

Ahmed Talat Sahlol,Mohammed A A Al-Qaness,Sunghwan Kim,Mohamed Abd Elaziz

doi:10.1109/access.2020.2970438

Abstract

Accomplishing high recognition performance is considered one of the most important tasks for handwritten Arabic character recognition systems. In general, Optical Character Recognition (OCR) systems are constructed from four phases: pre-processing, feature extraction, feature selection, and classification. Recent literature focused on the selection of appropriate features as a key point towards building a successful and sufficient character recognition system. In this paper, we propose a hybrid machine learning approach that utilizes neighborhood rough sets with a binary whale optimization algorithm to select the most appropriate features for the recognition of handwritten Arabic characters. To validate the proposed approach, we used the CENPARMI dataset, which is a well-known dataset for machine learning experiments involving handwritten Arabic characters. The results show clear advantages of the proposed approach in terms of recognition accuracy, memory footprint, and processor time than those without the features of the proposed method. When comparing the results of the proposed method with other recent state-of-the-art optimization algorithms, the proposed approach outperformed all others in all experiments. Moreover, the proposed approach shows the highest recognition rate with the smallest consumption time compared to deep neural networks such as VGGnet, Resnet, Nasnet, Mobilenet, Inception, and Xception. The proposed approach was also compared with recently published works using the same dataset, which further confirmed the outstanding classification accuracy and time consumption of this approach. The misclassified failure cases were studied and analyzed, which showed that they would likely be confusing for even Arabic natives because the correct interpretation of the characters required the context of their appearance.

Highlights

In character recognition systems, many solutions have been constructed for different languages, such as English, Japanese, and Chinese; relatively little progress has been made for the Arabic language
We present the steps of the BWOA-Neighborhood Rough Sets (NRS) algorithm in Algorithm 1
EXPERIMENTAL RESULTS The proposed approach was implemented in MATLAB with preprocessing, feature extraction, and feature selection by the BWOA-NRS algorithm, while classification was performed using Python

Summary

Introduction

Many solutions have been constructed for different languages, such as English, Japanese, and Chinese; relatively little progress has been made for the Arabic language. The associate editor coordinating the review of this manuscript and approving it for publication was Shiqiang Wang. Of handwritten Arabic characters is still a current and relatively unaddressed research problem. The digitization of Arabic documents can open windows for the processing (indexing, searching,...) of historic and Islamic documents [1]. Earlier efforts to digitize Arabic languages have encountered several issues. The alphabet system consists of 28 characters, with several types and numbers of dots; one, two, or three dots. There are several writing styles for each

Objectives

Results

Conclusion