Abstract

This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a “service description”, which includes an interface and a server's URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.

Highlights

  • Arabic language is the primary language in the Middle East, and it includes a large volume of data posted in information applications

  • It is inevitable to solve both issues of ‘‘segmentation’’ and ‘‘recognition’’, which doubles the challenge. This cannot be done with irregular lines such as decorative shown in figure 2

  • Signs of phoneme: diacritics in Arabic script are an additional complication in front of any OCRing system for recognition of Arabic text, because they do not fall into the context of a horizontal sequence such as alphabet [20] as shown in figure 7 below

Read more

Summary

INTRODUCTION

Arabic language is the primary language in the Middle East, and it includes a large volume of data posted in information applications. Arabic document information retrieval (ADIR) has a very broad range of heterogeneous data with related analyzed methods. Many of repositories aims to documents exploitation and documents analysis in the digital age [3] They are needing to access datasets and methods using web services (like Simple Object Access Protocol (SOAP)) [3]. Al-Barhamtoshy et al.: Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image. RELATED WORKS OCR services gained many attentions in document image analysis in the digital age. It is inevitable to solve both issues of ‘‘segmentation’’ and ‘‘recognition’’, which doubles the challenge This cannot be done with irregular lines such as decorative shown in figure 2

Arabic dotted
Text and non-text regions
Handwritten
EVALUATION OF THE PROPOSED MODEL
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.