Abstract

We describe an offline unconstrained Arabic handwritten word recognition system based on segmentation-free approach and discrete hidden Markov models (HMMs) with explicit state duration. Character durations play a significant part in the recognition of cursive handwriting. The duration information is still mostly disregarded in HMM-based automatic cursive handwriting recognizers due to the fact that HMMs are deficient in modeling character durations properly. We will show experimentally that explicit state duration modeling in the HMM framework can significantly improve the discriminating capacity of the HMMs to deal with very difficult pattern recognition tasks such as unconstrained Arabic handwriting recognition. In order to carry out the letter and word model training and recognition more efficiently, we propose a new version of the Viterbi algorithm taking into account explicit state duration modeling. Three distributions (Gamma, Gauss, and Poisson) for the explicit state duration modeling have been used, and a comparison between them has been reported. To perform word recognition, the described system uses an original sliding window approach based on vertical projection histogram analysis of the word and extracts a new pertinent set of statistical and structural features from the word image. Several experiments have been performed using the IFN/ENIT benchmark database and the best recognition performances achieved by our system outperform those reported recently on the same database.

Highlights

  • The term handwriting recognition (HWR) refers to the process of transforming a language, which is presented in its spatial form of graphical marks, into its symbolic representation

  • This paper describes an extended version of an offline unconstrained Arabic handwritten word recognition system based on segmentation-free approach and discrete hidden Markov models (HMMs) with explicit state duration [24]

  • The input image goes through the steps of preprocessing, feature extraction, vector quantization and classification.The classification stage uses a discrete observation sequence derived from the input image according to a sliding window approach, a tree-structured lexicon, and a database of HMMs with explicit state duration where each of them is related to a lexicon entry

Read more

Summary

INTRODUCTION

The term handwriting recognition (HWR) refers to the process of transforming a language, which is presented in its spatial form of graphical marks, into its symbolic representation. This paper describes an extended version of an offline unconstrained Arabic handwritten word recognition system based on segmentation-free approach and discrete HMMs with explicit state duration [24]. Significant experiments have been performed on the IFN/ENIT benchmark database [26] They have shown on the one hand a substantial improvement in the recognition rate when HMMs with explicit state duration of either discrete or continuous distribution is used instead of classical HMMs (i.e., with implicit state duration, cf Section 3.2). The HMM parameter selection is discussed and the resulting performances are presented with respect to the state duration distribution type, as well as to the word segmentation scheme into frames and the word model training method.

RELATED WORKS
Duration modeling in the HMM framework
Discrete distribution
Continuous distribution
The modified Viterbi algorithm
SYSTEM ARCHITECTURE
PREPROCESSING
FEATURE EXTRACTION AND VECTOR QUANTIZATION
Statistical features
Structural features
Vector quantization
WORD MODEL TRAINING AND CLASSIFICATION
RESULTS AND DISCUSSIONS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call