Abstract

Problem statement: Offline recognition of handwritten Arabic text awaits accurate recognition solutions. Most of the Arabic letters have secondary components that are important in recognizing these letters. However these components have large writing variations. We targeted enhancing the feature extraction stage in recognizing handwritten Arabic text. Approach: In this study, we proposed a novel feature extraction approach of handwritten Arabic letters. Pre-segmented letters were first partitioned into main body and secondary components. Then moment features were extracted from the whole letter as well as from the main body and the secondary components. Using multi-objective genetic algorithm, efficient feature subsets were selected. Finally, various feature subsets were evaluated according to their classification error using an SVM classifier. Results: The proposed approach improved the classification error in all cases studied. For example, the improvements of 20-feature subsets of normalized central moments and Zernike moments were 15 and 10%, respectively. Conclusion/Recommendations: Extracting and selecting statistical features from handwritten Arabic letters, their main bodies and their secondary components provided feature subsets that give higher recognition accuracies compared to the subsets of the whole letters alone.

Highlights

  • Arabic letters are used in about 27 writing languages including Arabic, Persian, Kurdish, Urdu and Jawi[1]

  • Some progress has been made on recognizing handwritten Arabic text samples of limited vocabulary (e.g., IFN/ENIT database of handwritten Tunisian town names[4])

  • We propose a new technique to extract statistical features of handwritten Arabic letters

Read more

Summary

Introduction

Arabic letters are used in about 27 writing languages including Arabic, Persian, Kurdish, Urdu and Jawi[1]. Offline recognition of handwritten cursive text such as Arabic text is an active research problem[2,3]. Offline recognition of unconstrained handwritten cursive text must overcome many difficulties such as unlimited variation in human handwriting, similarities of distinct character shapes, character overlaps and interconnections of neighboring characters. Some progress has been made on recognizing handwritten Arabic text samples of limited vocabulary (e.g., IFN/ENIT database of handwritten Tunisian town names[4]). In ICDAR Arabic handwriting recognition competitions held in 2005 and 2007[5,6], best systems’ accuracies improved from 76-87% on the IFN/ENIT database. Recognition accuracy of unlimited vocabulary is still unacceptable for many applications

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call