Abstract

In this paper, we propose a novel method for Multispectral document image binarization (MSdB) through the Non-negative Matrix Factorization (NMF) approach. We propose a three-step MSdB-NMF framework: i) NMF-based feature extraction algorithm by introducing a new optimization problem; ii) post-processing method iii); apply any existing gray/RGB binarization scheme. In the first step, we extract N features out of B spectral bands (N < B) and their corresponding coefficient matrix. We introduce a novel objective formulation that considers the robustness (related to the noise and various types of degradations) and sparseness (related to the ratio of text pixels versus the background). We employ the multiplicative updating rules to solve the proposed minimization problem and prove the convergence of the proposed feature extraction algorithm. In the next step, we select an appropriate feature vector, equivalently the corresponding coefficient vector. We propose to select it either visually or automatically via a post-processing method, which uses the benchmark binarization methods as baseline. In the last step, we apply some existing binarization methods such as Sauvola and Howe over the selected coefficient vector. Our proposed binarization framework is applicable for any kind of MS or hyperspectral (HS) document image without considering any prior knowledge such as the side information about the spectral bands of MS/HS document image. We evaluate our proposed binarization framework over two MS document image datasets. The experimental results confirm that our proposed framework outperforms several state-of-theart binarization schemes including the winner of the contest in MS-TEx-2015.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.