Abstract

AbstractIn a document analysis system, block segmentation and block classification are very important. The former segments a particular document image into homogeneous rectangular blocks. The latter classifies the segmented blocks into categories. These classified blocks may then be processed by suitable recognition systems. In this paper, we formalize the structure styles of general documents, and then propose both a robust hierarchical method of block segmentation and a simple method of block classification. The proposed block segmentation method takes a top‐down hierarchical approach based on the spatial features and formalized concepts of document structure. This method is essentially independent of the document style, and can perform a type of structural analysis of the document image. The classification approach is based on a new scheme of statistical textual features, and classifies the segmented blocks into four categories: text blocks, title letter blocks, line drawing (or graphics) blocks, and halftone phonograph blocks. The proposed approaches were implemented using the C programming language in an X‐Window environment under the UNIX operating system. The performance of each approach was experimentally evaluated, for both effectiveness and computational efficiency, using actual test images.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.