Abstract
In this paper we focus on ancient manuscripts, acquired in the RGB modality, which are degraded by the presence of complex background textures that interfere with the text of interest. Removing these artifacts is not trivial, especially with ancient originals, where they are usually very strong. Rather than applying techniques to just cancel out the interferences, we adopt the point of view of separating, extracting and classifying the various patterns superimposed in the document. We show that representing RGB images in different color spaces can be effective for this goal. In fact, even if the RGB color representation is the most frequently used color space in image processing, it does not maximize the information contents of the image. Thus, in the literature, several color spaces have been developed for analysis tasks, such as object segmentation and edge detection. Some color spaces seem to be particularly suitable to the analysis of degraded documents, allowing for the enhancement of the contents, the improvement of the text readability, the extraction of partially hidden features, and a better performance of thresholding techniques for text binarization. We present and discuss several examples of the successful application of both fixed color spaces and self-adaptive color spaces, based on the decorrelation of the original RGB channels. We also show that even simpler arithmetic operations among the channels can be effective for removing bleed-through, refocusing and improving the contrast of the foreground text, and to recover the original RGB appearance of the enhanced document.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.