Abstract

In this paper, a novel local threshold binarization method using fast Fuzzy C-Means clustering is proposed. Historical document images with non-uniform background, stains, faded ink are first processed by removing the background using inpainting based method. Then using Fuzzy C-Means clustering is used to cluster out the pixels into three main clusters : sure text pixels, sure background pixels and confused pixels which may or may not be labeled as text. Based on the structural symmetry of pixels (SSP), these confused pixels are then classified into text or background pixels. The SSP is defined as those pixels around strokes whose gradient magnitudes are big enough and whose directions are opposite. As the gradient map is our basis for computing the SSP, we further propose to estimate the background surface first and to extract potential SSP in the compensated image so as to deal with degradations of document images such as uneven illumination, low contrast and stain. To prove the effectiveness of our method, tests on eight public document image datasets are preformed and the experimental results show that our method outperforms other local threshold binarization approaches on both F-measure and PSNR.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call