A Structural Approach for Segmentation of Unconstrained Handwritten Hindi Words

Soumen Bag ,A Shiva Rama Krishna

doi:10.1109/eait.2014.31

Abstract

Segmentation of printed or handwritten words into characters is an important preprocessing step for optical character recognition (OCR) systems. It is important because incorrectly segmented characters are less likely to be recognized correctly. The scripts those are fully cursive in nature is difficult to segment. Hindi as well as almost all other Indian languages has this feature in common. For that reason they pose some high challenges for character segmentation. The main challenge in handwritten character segmentation is the inherent variability in the writing style of different individuals. In this paper, we propose an efficient character segmentation algorithm for Hindi handwritten words. Segmentation is performed on the basis of some structural patterns observed in the handwritten words in Hindi. Our algorithm can cope with high variations in writing style and skewed header lines as input. The algorithm has been tested on a database prepared for experimental use. The average success rate is 97.09%. The method yields fairly good results for this database comparing with other existing methods. This method can be considered as a significant preprocessing step towards the development of a handwritten Devanagari OCR system.

Full Text