Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Hassanin M Al-Barhamtoshy,Sherif M Abdou,Kamal M Jambi,Mohsen A Rashwan

doi:10.1109/access.2021.3066477

Hassanin M Al-Barhamtoshy, Sherif M Abdou + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3066477

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 27	License type: CC BY 4.0

Affiliation: King Abdulaziz University, Cairo University

Abstract

This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a “service description”, which includes an interface and a server's URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.

Highlights

Arabic language is the primary language in the Middle East, and it includes a large volume of data posted in information applications
It is inevitable to solve both issues of ‘‘segmentation’’ and ‘‘recognition’’, which doubles the challenge. This cannot be done with irregular lines such as decorative shown in figure 2
Signs of phoneme: diacritics in Arabic script are an additional complication in front of any OCRing system for recognition of Arabic text, because they do not fall into the context of a horizontal sequence such as alphabet [20] as shown in figure 7 below

Summary

INTRODUCTION

Arabic language is the primary language in the Middle East, and it includes a large volume of data posted in information applications. Arabic document information retrieval (ADIR) has a very broad range of heterogeneous data with related analyzed methods. Many of repositories aims to documents exploitation and documents analysis in the digital age [3] They are needing to access datasets and methods using web services (like Simple Object Access Protocol (SOAP)) [3]. Al-Barhamtoshy et al.: Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image. RELATED WORKS OCR services gained many attentions in document image analysis in the digital age. It is inevitable to solve both issues of ‘‘segmentation’’ and ‘‘recognition’’, which doubles the challenge This cannot be done with irregular lines such as decorative shown in figure 2

Arabic dotted

Text and non-text regions

Handwritten

EVALUATION OF THE PROPOSED MODEL

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Deep learning-based classification and mutation prediction from histopathological images of hepatocellular carcinoma.
Haotian Liao ... Wei Wang
Clinical and Translational Medicine | VOL. 10
Haotian Liao, et. al.Haotian Liao ... Wei Wang
01 Jun 2020
Clinical and Translational Medicine | VOL. 10

The effect of data diversity on the performance of deep learning models for predicting early gastric cancer under endoscopy
Conghui Shi ... Lianlian Wu
Journal of Digital Health | VOL. -
Conghui Shi, et. al.Conghui Shi ... Lianlian Wu
21 Feb 2022
Journal of Digital Health | VOL. -

Predicting lymph node status based on deep learning on histopathological images from primary tumor of early-stage cervical squamous cell carcinoma (080)
Qinhao Guo ... Xiaohua Wu
Gynecologic Oncology | VOL. 166
Qinhao Guo, et. al.Qinhao Guo ... Xiaohua Wu
01 Aug 2022
Gynecologic Oncology | VOL. 166

Prediction of epidermal growth factor receptor (EGFR) mutation status in lung adenocarcinoma patients on computed tomography (CT) images using 3-dimensional (3D) convolutional neural network.
Guojin Zhang ... Zhuoli Zhang
Quantitative imaging in medicine and surgery | VOL. 14
Guojin Zhang, et. al.Guojin Zhang ... Zhuoli Zhang
01 Aug 2024
Quantitative imaging in medicine and surgery | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access