Abstract

Binarization is often used for pixel-wise document text extraction as preprocessing step for scanned historical documents. These documents are scanned in color and high resolution today. The reduction of color to grayscale images and the subsequent binarization implies a loss of information and often results in unsatisfying processing results. In this paper, a color segmentation instead of a binarization approach is used to segment text from background in historical manuscripts. A color segmentation approach based on Markov random fields with a reduced set of required parameters is presented to segment text written in different colors from noisy page background. First tests with historical Arabic manuscripts show promising results. In case of words written in light red color, our approach shows better results than a state-of-the-art binarization approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.