Abstract

Recognition of cursive handwritten Arabic text is a difficult problem because of context-sensitive character shapes, the non-uniform spacing between words and within a word, diverse placements of dots, and diacritics, and very low inter-class variation among individual classes. In this paper, we review and investigate different deep learning architectures and modeling choices for Arabic handwriting recognition. Further, we address the problem that imbalanced data sets present to deep learning systems. In order to address this issue, we are presenting a novel adaptive data-augmentation algorithm to promote class diversity. This algorithm assigns a weight to each word in the database lexicon. This weight is calculated based on the average probability of each class in a word. Experimental results on the IFN/ENIT and AHDB databases have shown that our presented approach yields state-of-the-art results.

Highlights

  • Optical Character Recognition (OCR) is an old field of Pattern Recognition (PR)

  • The contributions of the paper are as follows: 1- We present and evaluate efficient deep learning architectures used for Arabic handwriting recognition

  • IFN/ENIT DATABASE The IFN/ENIT database is the most widely used and popular database for handwritten Arabic text recognition research published by Pechwitz and Maergner [21]

Read more

Summary

INTRODUCTION

Optical Character Recognition (OCR) is an old field of Pattern Recognition (PR). The human reading process is the inspiration behind the development of a machine capable of reading texts with the same expertise as people. M. Eltay et al.: Exploring Deep Learning Approaches to Recognize Handwritten Arabic Texts capabilities of handwriting recognition systems. The use of Connectionist Temporal Classification (CTC) in connection with RNNs allows recognition without prior segmentation [17] This has made it possible to recognize offline handwritten text by using neural network-based classifiers and has gained popularity in recent years. Rabi et al [29] proposed a cursive handwritten Arabic text recognition system based on Hidden Models (HMM). This system is analytical and uses embedded training to perform and improve the character models without explicit segmentation. Ahmad and Fink [31] presented a new way of representing Arabic characters by separating the core shapes from the diacritics and representing these core shapes through smaller units called sub-core shapes

BACKGROUND
ARABIC HANDWRITING RECOGNITION MODELING OPTIONS
DEEP LEARNING APPROACHES
TOKEN PASSING DECODER
A NOVEL ADAPTIVE DATA AUGMENTATION ALGORITHM
RESULTS OF USING ADAPTIVE DATA AUGMENTATION
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.