Abstract

Handwritten document image segmentation into text-lines is a crucial stage towards unconstrained handwritten document recognition. In the context of Indian subcontinent various scripts are used for communication where a system for multi-script handwritten text line segmentation is very much essential. This paper presents a multi-script text line segmentation algorithm based on newly developed light projection, start point detection, and boundary tracking methods. The proposed approach is capable of overcoming most of the hindrance faced by state-of-the-art methods. The experiment is performed on our proposed Bangla handwritten document image dataset WBSUBNdb_text and also on a variety of well-known public handwritten datasets namely: CMATERdb, PhDIndic_11, KHATT, HIT-MW, ISI Bengali Writer Identification/Verification dataset, ICDAR 2013 segmentation contest dataset, ICDAR 2013 writer identification contest benchmark dataset, and obtained promising results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call