Text line extraction from handwritten document pages based on line contour estimation

R. Sarkar,M. Nasipuri,S. Malakar,N. Das,S. Halder,S. Basu

doi:10.1109/icccnt.2012.6395873

Abstract

Extraction of text lines from handwritten/printed document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on line contour estimation is reported. Here, digitized document image is initially partitioned into a number of vertical fragments of equal width. Then all the line segments present in these vertical fragments are detected. Finally, the neighboring line segments are analyzed to place them inside the line boundary in which they actually belong. For experimental purpose, the developed technique is tested on CMATERdb1.2.1 database and present technique extracts 88.44% text lines successfully.

Full Text