Abstract

A review of the published research confirms that recognition of printed Arabic Word continues to present challenges. This is specially the case when segmentation is problematic. A word level recognition system is presented here that does not rely on any segmentation or require baseline detection of ascenders and descenders. A Discrete Hidden Markov classifier along with a block-based Discrete Cosine Transform (DCT) is used to construct a novel holistic Arabic printed word recognizer. A balanced database of word-image has been constructed to ensure an even distribution of word samples. The Arabic words are typewritten in five fonts having a size 14 points in a plain style. The system is applied on actual scanned word images with no overlap between the training and testing datasets. Word feature vectors are extracted using block-based DCT. A Hidden Markov Models Toolkit (HTK) is used to construct the recogniser. Vector Quantisation is used to map each feature vector to the closest symbol in the codebook. The output of the system is multiple recognition hypotheses (N-best word lattice). The results are encouraging when compared with other published research in this area achieving on average 97.65% accuracy which is significantly higher than previously published results. A detailed comparison and analysis of the results are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call