
Binarization methods play a central role in document image processing. It is usually performed in the preprocessing stage and is important for document image processing tasks such as optical character recognition (OCR). Segmentation of text from badly degraded document images is a challenging task because of the high inter/intra-variation between the document background and foreground text of different document images. So method for segmenting the foreground text from the background is presented here. In this method first of all an image having high contrast has been constructed. For this a rough estimation of background is to be made. Then a hybrid algorithm for thresholding has been used. It consists of both global and local thresholding methods. The Global thresholding step has been modified such that the output will not be a binarized image but an intermediate gray level image. It is helpful as most of the background gets eliminated. Local thresholding will be applied on the result given by global thresholding step. This method is simple, robust and effective. The proposed method works better than most of the existing local and global thresholding algorithms and is able to deal with degradations which occur due to strain, ink bleed through, low contrast, water marks, dust, smear and uneven illumination etc. This method has been tested on three public datasets that are used in recent document image binarization contest (DIBCO) 2009 and 2011and handwritten-DIBCO 2011and achieves the results which are significantly higher than or close to the best performing methods reported in three contests. Also to show the superior performance of the proposed method compared with other techniques, experiments have been performed on more challenging bickley diary dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.