Abstract

Automatically detecting and parsing tables into an indexable and searchable format is an important problem in document digitization. It relates to computer vision, machine learning, and optical character recognition. This paper presents a simple model based on a deep neural network architecture that combines recent advances in computer vision and machine learning, which can be used to detect and convert a table into a format that can be edited or searched. The motivation for this work is to develop a sound method to extract the vast data set of knowledge available in physical documents such that it can be used to develop data-driven tools that can be used to support decisions in fields such as healthcare and finance. The model uses a Yolo-based object detector trained to maximize the Intersection over Union of the detected table regions within the document image and a novel OCR-based algorithm to parse the table from each table detected in the document.Past works have all focused on documents and images containing a level and even tables. This paper aims to present our findings after the model is run on a set of skewed image datasets. Experiments on the Marmot and Publaynet datasets show that the proposed method is entirely accurate and can generalize different tables formats. At an Intersection over the Union threshold of 50%, we achieve a mean Average Precision (mAP) of 98% and an average IoU of 88.81% on the PubLayNet dataset. With the same IoU threshold, we achieve an mAP of 95.07% and an average IoU of 75.57% on the Marmot dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call