Local thresholding of degraded or unevenly illuminated documents using fuzzy inclusion and entropy measures

Athanasios C Bogiatzis,Basil K Papadopoulos

doi:10.1007/s12530-018-09262-5

Abstract

There are applications in which the content of a scanned document needs to be recognized or improved. We often achieve this by converting our input into a binary image and this is in fact the first step in many document analysis systems or optical character recognition (OCR) processes. In cases where our input is degraded or has a non-uniform illumination, global thresholding algorithms fail to deliver adequate results. For this reason, we have to use some local thresholding techniques which binarize each pixel based on the grayscale information of its adjoining pixels. In this paper, we present a local thresholding method based on specific fuzzy inclusion and entropy measures which we introduced in some of our previous work. We use these indicators to measure specific attributes of the neighborhood of a pixel and then, based on these values, an appropriate threshold is calculated. We don’t use the histogram of the image or any statistical measures and contrast parameters depending on the input. It is an open, automated and adaptable procedure and in this presentation we see some implementations of a more general algorithm along with some specific results. Our main domain of experimentation consists of texts containing lighting “irregularities” but some remarks regarding further generalization are being made as well. We also comment on other potential of these measures and the prospect of being connected with other studies that already use fuzzy inclusion and entropy measures.

Full Text