Abstract

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.