Text line extraction for historical document images using steerable directional filters

Omar Alaql,Cheng Chang Lu

doi:10.1109/icalip.2014.7009807

Abstract

Vast amounts of valuable historical documents exist in libraries and in various National Archives that have not been exploited electronically. The analysis of historical documents presents specific difficulties with respect to other types of handwritten documents. Because of the low quality and the complexity of these documents, the document analysis remains an open research field. One of the major processes to analyze these documents is automatic text line extraction, which influences the accuracy of text recognition. The Center for Unified Biometrics and Sensors (CUBS) proposed one of the best-known approaches for text line extraction. In this paper, and starting with the concepts of CUBS approach, we propose an approach to extract text lines from the historical document images. The proposed approach is based on three local connectivity maps. One has the orientation angles of the text lines, and it is generated by using a dynamic steerable directional filter. This map is modified by using a mode filter to determine the paragraph map in the documents. Based on the values of the paragraph map, the adaptive local connectivity map (ALCM) is generated by using a static steerable directional filter to estimate the location of the text line. The proposed approach solves the problem of the ALCM binarization that the CUBS approach has, and gives the advantage of extracting the paragraphs in the document besides the text lines segmentation.

Full Text