Abstract

We propose a novel hybrid approach that fuses traditional computer vision techniques with deep learning models to detect figures and formulas from document images. The proposed approach first fuses the different computer vision based image representations, i.e., color transform, connected component analysis, and distance transform, termed as Fi-Fo image representation. The Fi-Fo image representation is then fed to deep models for further refined representation-learning for detecting figures and formulas from document images. The proposed approach is evaluated on a publicly available ICDAR-2017 Page Object Detection (POD) dataset and its corrected version. It produces the state-of-the-art results for formula and figure detection in document images with an f1-score of 0.954 and 0.922, respectively. Ablation study results reveal that the Fi-Fo image representation helps in achieving superior performance in comparison to raw image representation. Results also establish that the hybrid approach helps deep models to learn more discriminating and refined features.

Highlights

  • Digitization of document images is a growing need for commercial and non-commercial entities, for example, banks, industries, educational institutes, and libraries

  • ICDAR-2017 Page Object Detection (POD) was released recently for a competition focused on figure, formula, and table detection from document images

  • We evaluate the deformable variants of Faster-RCNN, R-fully convolutional networks (FCNs), and FPN

Read more

Summary

Introduction

Digitization of document images is a growing need for commercial and non-commercial entities, for example, banks, industries, educational institutes, and libraries. Aside from record-keeping, it significantly improves the availability of data just at a click and/or a tap from anywhere in the world, at any time. These digitized documents can be processed in an automated fashion given that the information contained in those documents can be extracted reliably. Reliable extraction of information from documents has been a major focus of the document analysis community for decades [1,2,3,4]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.