Abstract

This paper addresses the problem of semantic overlap across document objects in the context of ground truth representation for document layout analysis. Document object categories often share primitives from a low-level perspective (e.g. regions inside bars in a bar chart resemble background), making it difficult to evaluate document layout segmentation methods based on pixel classification, as most datasets and ground truth models focus on document objects. We propose a novel ground truth model that utilizes structural and statistical pattern recognition concepts. Statistical pixel-based data derived from low-level elemental patterns are layered onto high-level structural object-based data. We also present evaluation metrics that take advantage of the layered ground truth model, allowing a contextual evaluation of pixel classification algorithms. We apply the proposed model to two recent pixel classification approaches, evaluated on business document images that exhibit a challenging mixture of textual, graphical, and pictorial elements through varied layouts. The proposed model allows to obtain very detailed, comprehensive, and intuitive information on the strengths and limitations of the evaluated approaches that would be impossible to obtain through other models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.