Abstract

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that make the digitization of documents viable. Since the advent of deep learning, deep learning-based object detection performance has improved many folds. This work outlines and summarizes the deep learning approaches for detecting graphical page objects in document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Highlights

  • We have presented a thorough analysis of the recent state-of-theart approaches that have approached the problem of graphical page object detection in

  • We provide an evaluative comparison among the state-of-the-art graphical page object detection systems

  • By leveraging the segmentation loss of Mask R-Convolutional Neural Networks (CNN), researchers in the document image analysis community have improved the performance of graphical page object detection systems

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. It is evident that even the state-of-the-art OCR method [6] fails to extract precise information from figures, tables, and formulas Another application of such page object detection methods is document retrieval systems [7,8], where a document image having a specific type of page object is required. The approaches leveraging these datasets have significantly improved state-of-the-art, a consolidated comparison among these approaches is missing In this survey paper, we have presented a thorough analysis of the recent state-of-theart approaches that have approached the problem of graphical page object detection in.

Discussion and Conclusion
Traditional Approaches
Methodologies
Method
Faster R-CNN
Mask R-CNN
Deformable Convolutions
Dynamic Programming Based Approach
Fully Convolutional Neural Networks
Datasets
ICDAR-17 POD
PubLayNet
DocBank
Marmot
TableBank
IIIT-AR-13k
DeepFigures
ICDAR-13
4.10. ICDAR-2019
Evaluation
Precision
Intersection Over Union
Evaluation for Table Detection
Evaluation for Figure Detection
Evaluations for Formula Detection
Discussion and Conclusions
Difficulties and Challenges
Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.