Abstract

Epochal documents suffer from several types of noises that accumulate and evolve over time. This significantly affects their quality and makes their storage and the interpretation of their visual content problematic. Digital preservation seems the most viable and the most promising. Moreover, measuring the amount of degradation and quality assessment of degraded documents is highly desirable for applications such as selecting the proper algorithms for enhancement and analysis of document images, filtering the damaged images, tuning the processing algorithms parameters, document repairing, psychological study, etc. The first contribution of this work is the proposition of an efficient Multi-distortion Document Quality Measure (MDQM) for quality assessment of physically degraded document images. The proposed MDQM metric is based on three sets of spatial and frequency image features. These features are extracted from two layers of text and non-text and mapped to the mean opinion scores (MOS) using the regression function. The second contribution of this work is to estimate the probability of four common document image distortion types, namely, paper translucency, stain, readers annotations and worn holes in the degraded images. In our experiment, the correlations of seven no-reference image quality assessment (NR-IQA) metrics with the MOS values are evaluated on two available datasets. It is shown that the performance of MDQM metric is significantly better than the state-of-the-art NR-IQA metrics. Moreover, the experimental results demonstrate that MDQM metric not only leads to high efficacy for classification of the various degradations but also maintains a remarkable run-time efficiency. It is worth to mention that the proposed method has been conducted for Arabic documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.