Abstract

Tables are an easy way to represent information in a structural form. Table recognition is important for the extraction of such information from document images. Usually, modern OCR systems provide textual information coming from tables without recognizing actual table structure. However, recognition of table structure is important to get the contextual meaning of the contents. Table structure recognition in heterogeneous documents is challenging due to a variety of table layouts. It becomes harder where no physical rulings are present in a table. This work proposes a novel learning based methodology for the recognition of table contents in heterogeneous document images. Textual contents of documents are classified as table or non-table elements using a pre-trained neural network model. The output of the neural network is further enhanced by applying a contextual post processing on each element to correct the classifications errors if any. The system is trained using a subset of UNLV and UW3 document images and depicted more than 97% accuracy on a test set in detection of table and non-table elements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.