Abstract

Segmentation is always an important step in designing an Optical Character Recognition (OCR) of any script. In this paper, we focus on the line and word segmentation in typewritten Gurmukhi script documents. In order to perform this task, we consider OCR based methodology where several processing steps are implemented. The typewritten documents suffer from several issues such as noise, skew, and quality of the document. In this work, we present a combined pre-processing scheme where document thresholding and skew detection and correction schemes are implemented where image thresholding is obtained using Niblack’s method and skew correction is carried out using gradient histogram algorithm and uniform orientation is obtained. Later, line segmentation scheme is applied where probability density function is applied to generate the text distribution in the probability map. Here, identifying the relation of the text to the exact line is a challenging task hence, we present a 2D-Gaussian modelling which helps to identify the text boundaries in the x and y direction. The proposed methodology is applied for typewritten Gurmukhi documents and an experimental study is carried out to show that the proposed approach achieves better performance when compared with the existing techniques

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.