Abstract

This chapter describes a complete system for the recognition of unconstrained handwritten Arabic words using over-segmentation of characters and a variable duration hidden Markov model (VDHMM). First, a segmentation algorithm based on morphology and linguistic information is used to translate the 2D image into a 1D sequence of subcharacter symbols. This sequence of symbols is modeled by one single contextual VDHMM. Generally, there are two information sources associated with the written text: shape information and linguistic information. Forty-five features are selected to represent the shape information of character and subcharacter symbols in the feature space. The shape information of each character symbol, i.e., a feature vector, is modeled as an independently distributed multivariate discrete distribution or a joint continuous distribution. Linguistic knowledge about character transition is modeled as a Markov chain, where each character in the alphabet is a state and bigram probabilities are the state transition probabilities. In this context, the variable duration state is used to take care of the segmentation ambiguity among the consecutive characters. We outline the substantial effort that has been expended to create a corpus of handwritten Arabic words and characters extracted from these handwritten words. Using this corpus and the IFN dataset 2003, detailed experimental results are described to demonstrate the success of the proposed scheme.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call