A Hybrid Method for Text Line Extraction in Handwritten Document Images

Ehsan Kiumarsi,Alireza Alaei

doi:10.1109/icfhr-2018.2018.00050

Abstract

Text line segmentation in handwritten document image, as one of the preliminarily steps for document image recognition, is a challenging problem. In this paper, a hybrid method for text line extraction in handwritten document images is presented. Initially, a connected component (CC) labelling method following by a CC filtering is employed to extract a set of CCs from the input document image. A new distance measure is introduced to compute normal distances between the extracted CCs. By traversing the normal distance matrix from both the right and left directions, half-chains of CCs are constructed. The CCs half-chains are merged to obtain CCs full-chains. From the extracted full-chains separator lines are obtained. A gradient metric is proposed to detect and remove touching text lines. Using remaining separator lines the adaptive projection profile of the image is computed. Based on the projection profile, coarse text line extraction is performed. Finally, a fine text lines extraction is performed by applying a postprocessing step. To evaluate the method, two benchmarks named ICDAR2013 handwriting segmentation contest, and Kannada datasets composed of handwritten document images in English, Greek, Bengali, and Kannada languages were considered for experimentation. Experimental results indicate a promising performance was obtained compared to some of the state-of-the-art methods.

Full Text