Abstract

Recognition of documents of poor image quality is a challenging and important problem from a practical point of view. In traditional approaches, features such as center lines of strokes or contours are extracted from binary images obtained by thresholding the gray-scale intensity images. Wang and Pavlidis (IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1993, 1053–1067) have recently pointed out that effective features for recognition should be extracted directly from original gray-scale intensity images in order to avoid a significant amount of information loss caused by binarization. In this paper, a novel method is presented for extracting closed boundaries of document components such as characters and symbols directly from gray-scale document images, based on the surface data structures and structural features. The gray-scale document image can be treated as a surface defined over a two-dimensional space by regarding intensity values associated with pixels as height. This method is based on a simple model that assumes a closed boundary of document components can be approximated as a series of horizontal (parallel to the image plane) line segments and can be extracted by linking surface components with steep gradients based on configurations of intersections of horizontal planes and surface components. Furthermore, the gray-scale image can be converted into a binary image based on extracted boundaries so that any recognition system can accept output of the proposed algorithm as input. The performance of the proposed algorithm is compared with some binarization algorithms based on global and local thresholding of intensity values and is shown to be effective for improving recognition accuracy for very poor quality data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.