Abstract

A vast number of historical and badly degraded document images can be found in libraries, public, and national archives. Due to the complex nature of different artifacts, such poor quality documents are hard to read and to process. In this paper, a novel adaptive binarization algorithm using ternary entropy-based approach is proposed. Given an input image, the contrast of intensity is first estimated by a grayscale morphological closing operator. A double-threshold is generated by our Shannon entropy-based ternarizing method to classify pixels into text, near-text, and non-text regions. The pixels in the second region are relabeled by the local mean and the standard deviation. Our proposed method classifies noise into two categories which are processed by binary morphological operators, shrink and swell filters, and graph searching strategy. The method is tested with three databases that have been used in the Document Image Binarization Contest 2009 (DIBCO 2009), the Handwriting Document Image Binarization Contest 2010 (H-DBCIO 2010), and the International Conference on Frontier in Handwriting Recognition 2010 (ICFHR 2010). The evaluation is based upon nine distinct measures. Experimental results show that our proposed algorithm outperforms other state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.