Abstract

CNN has given the state-of-the-art results in computer vision and natural language processing (NLP) domain problems. This has motivated researchers to use deep learning-based techniques for document layout analysis. Due to recent advances in communication and in information technology, methods of data storage, extraction and processing are rapidly evolving. In an information space, there is a large volume of digital documents (DD) already available, and more DD is created continuously. The DD can be natural images, scanned documents, mails, books, archives, etc. Processing, extraction and understanding of relevant information from these DD have prime business importance. Information in DD is present in the form of tables, text, figure, images, diagram, etc. Document images (DI) are DD which are present in the form of images. It is especially office, scanned documents. Recent progress in artificial intelligence has created a growing expectation for automation of data extraction from DD. Tables are frequently present in DI. There is a desperate need to handle tabular information present in DI; otherwise, this information will remain unused and there is high possibility that it will be lost. In this work, we present weakly supervised learning-based approach to detect and recognize the location of the table. The novelty of our work is that it did not require bounding box annotations for table detection. Application of our technique on ICDAR 2013 test data has demonstrated encouraging results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.