Zone Segmentation of a Text Line Printed in Gurmukhi Script Newspaper

Rupinder Pal Kaur,Munish Kumar,M K Jindal

doi:10.1109/pdgc.2018.8745796

Abstract

Newspapers consist of essential information and many efforts have been done for digitization and recognition of newspaper text. Few Gurumukhi script newspaper articles are available in digital form but searching of text is not possible on digital images. So, text processing is required for making searching possible on the text and also indexing of headlines. To recognize any text, segmentation of text into individual line is an important phase. As per characteristics of Gurmukhi script, text line can be divided into three zones namely upper zone, middle zone and lower zone. So, segmentation of an individual line into different zones is a preliminary phase for segmentation of text. Zone division is possible through detection of headline and baseline. Baseline detection is a tedious task because of the uneven presence of on and off pixels in a baseline. In this paper, authors have presented an algorithm for zone segmentation of Gurmukhi script newspaper text based on headline and baseline.

Full Text