Abstract

This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction tech- nique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the back- ground and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suita- ble binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray- scale) document images. Several experiments that confirm the effec- tiveness of the proposed technique are presented. V C 2007 Wiley

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.