Abstract
We present a learning-free method for text line segmentation of historical handwritten document images. This method relies on automatic scale selection together with second derivative of anisotropic Gaussian filters to detect the blob lines that strike through the text lines. Detected blob lines guide an energy minimization procedure to extract the text lines. Historical handwritten documents contain noise, heterogeneous text line heights, skews and touching characters among text lines. Automatic scale selection allows for automatic adaption to the heterogeneous nature of handwritten text lines in case the character height range is correctly estimated. In the extraction phase, the method can accurately split the touching characters among the text lines. We provide results investigating various settings and compare the model with recent learning-free and learning-based methods on the cBAD competition dataset.
Highlights
Digital handwritten documents are not explorable in their raw form but need to be transcribed further into machine readable text
This paper proposes a learning-free text line segmentation method for challenging historical handwritten documents as such cBAD dataset
We evaluated the method on another recent handwritten text line segmentation dataset, DIVA-HisDB
Summary
Digital handwritten documents are not explorable in their raw form but need to be transcribed further into machine readable text. There is a practical need for reliable handwritten document image processing algorithms. Text line segmentation is an essential operation and prerequisite for many document image analysis tasks. Advancement in text line segmentation performance will boost the performance of other tasks, such as word segmentation [1,2] and word recognition [3,4]. Text line detection locates each text line by its baseline or x-height representation. Text line extraction in turn leads to polygonal or pixel level representation of text lines. Extraction level representation is more precise and useful for higher level document image analysis tasks. With the advances in deep learning, numerous learning-based methods have been proposed for text line segmentation of handwritten documents
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have