A Novel Segmentation Technique for Printed Malayalam Characters

Bindu Philip,R D Sudhaker Samuel,C R Venugopal

doi:10.7763/ijcee.2010.v2.217

Abstract

Segmentation of whole characters in Indian scripts is a rather tricky problem. This is because in Indian scripts the characters have modifiers which could be subscripts, superscripts, attached or nonattached vowel signs to the front or following the character forming a complex composite character. Often a character is composed of several sub-characters. Unlike other South Indian scripts like Kannada, Telugu and Tamil, in the Malayalam script the space between the sub-characters of the character is same as the space between the characters within a word rendering the character segmentation process quite complex as conventional profiling methods fail. This paper presents a novel segmentation algorithm for segmentation of character in such complex cases taking Malayalam as a typical example. The success of the algorithm is demonstrated by application of feature extraction on segmented characters and subjected to classification. Segmentation efficiency of 98.8 % is achieved which is very encouraging.

Full Text