Automatic detection of telugu single and multi-character text blocks in handwritten words

N Shobha Rani,Vasudev T Vasudev T

doi:10.1109/coconet.2015.7411192

Abstract

Every character recognition system is variant with respect to its specific characteristics and problems. Segmentation of touching and overlapping characters in Telugu handwritten documents is one of the integral and imperative research problems. The solutions to such problems are inexhaustible due to its lengthier computational processing. Attempts for extending the application of touching character segmentation algorithms globally for the entire textual blocks increase the time complexity to the overall process of Optical Character Recognition (OCR). Therefore the objective of the present work is to abridge the computational complexity involved in segmentation of touching characters that are commonly witnessed in Telugu application form documents. This paper proposes a novel approach for classification of multi-character text blocks (a group of touching or overlapping character blocks) and single character text blocks (isolated character blocks) in Telugu handwritten words based on histogram features. Initially the handwritten word is segmented into blocks based on the vertical projection profiles. Then each block is partitioned into eight equal regions based on the statistical quantities percentiles. We compute the number of peaks greater than the mean of a particular partition from the horizontal projection profiles obtained from each partition. The empirical relations are derived from the number of peaks features computed with respect to each partition through which the character blocks are distinguished as single or multicharacter text blocks. The experimental results are adequate and consistent with an overall accuracy of around 96.56%.

Full Text