Abstract

The classical Otsu method is a common tool in document image binarization. Often, two classes, text and background, are imbalanced, which means that the assumption of the classical Otsu method is not met. In this work, we considered the imbalanced pixel classes of background and text: weights of two classes are different, but variances are the same. We experimentally demonstrated that the employment of a criterion that takes into account the imbalance of the classes' weights, allows attaining higher binarization accuracy. We described the generalization of the criteria for a two-parametric model, for which an algorithm for the optimal linear separation search via fast linear clustering was proposed. We also demonstrated that the two-parametric model with the proposed separation allows increasing the image binarization accuracy for the documents with a complex background or spots.

Highlights

  • We demonstrate that the unbalanced Otsu method modification [20] is better than the classical Otsu method by approximately 3 points in pseudo F-Measure metric

  • Later in [31], Jian Gong et al suggested the fast computation of the two-dimensional Otsu criterion described in [30] via cumulative summation: with cumulative histogram image computed once, it is feasible within the fixed timeframe to calculate weight of any orthotropic rectangle with its vertex at the origin

  • We evaluate the accuracy of the methods via one of Document Image Binarization Contest (DIBCO)’s metrics

Read more

Summary

Introduction

The well-known Niblack's method [11] in its classical implementation has two tuning parameters, one of which. They demonstrated that under such assumptions the binarization accuracy could be increased, but the testing dataset was not large (only 13 images) In their other work [28], the authors proposed an approach to accelerate the classical Otsu method. Later in [31], Jian Gong et al suggested the fast computation of the two-dimensional Otsu criterion described in [30] via cumulative summation: with cumulative histogram image computed once, it is feasible within the fixed timeframe to calculate weight (sum of all histogram pixels) of any orthotropic rectangle (which was used as the separating surface) with its vertex at the origin This is true for different statistics (mean, variance, and etc.). Otsu threshold is used as an input parameter during the pre-processing [39, 40], for the binarization of the “simple” images [15, 41 – 43], and for global threshold value computation [44]

Test datasets and accuracy measure
One-dimensional criteria of Otsu binarization
The Otsu method for image binarization via two features
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.