Deep Learning in Time-Frequency Domain for Document Layout Analysis

Felipe Grijalva,Juan Carlos Rodriguez,Erick Santos,Byron Acuna,Julio Cesar Larco

doi:10.1109/access.2021.3125913

Abstract

Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel approach for identifying tables, figures, isolated equations and text regions in scientific papers using deep learning and computer vision techniques. Our proposed approach is a three-stage system: (i) obtaining the spectrograms of the horizontal and vertical intensity histograms of segmented regions of interest; (ii) labeling segmented regions of interest into text, table, and figures using a deep convolutional neural network classifier; and (iii) identifying isolated equations in text regions using Bag of Visual Words (BOVW) with Zernike moments. We built a new dataset composed of 11007 papers to perform the experiments, using two common segmentation metrics to evaluate our model: (1) Adjusted Rand Index (ARI) and (2) Variation of Information (VI). The proposed document layout analysis system reached an overall accuracy of 96.2685%, outperforming prior art with a less computational cost.

Highlights

D OCUMENT layout analysis (DLA) [1] is still one of the most challenging areas of information retrieval [2] due to the wide variety of documents that can be authored and the lack of structured information [3] in standardized formats for exchanging information such as Portable Document File (PDF)
We present a new approach for DLA, which consists of: 1) Segmentation of the regions of interest; 2) Generation of spectrograms using horizontal and vertical pixels profile projections of the regions of interest; 3) Implementation of a deep Convolutional Neural Networks (CNNs), trained for three classes: text, table, and figures; and 4) Use of the Bag of Visual Words (BOVW) technique to identify lines with isolated equations within the text regions
Sparse Ratio: In [17] and [15] it was found that lines with isolated equations produce a higher sparse ratio than lines without isolated equations. In addition to these three features, we propose a new set of features based on a Bag of Visual Words (BoVW) of each symbol contained in a line

Summary

Introduction

D OCUMENT layout analysis (DLA) [1] is still one of the most challenging areas of information retrieval [2] due to the wide variety of documents that can be authored and the lack of structured information [3] in standardized formats for exchanging information such as Portable Document File (PDF). The main feature of PDFs created digitally is the preservation of the visual structure of the document in any electronic device, turning PDF files into the current standard format for electronic document exchange [4]. There is no universal algorithm [5] that fully understands all regions of a digital document, i.e., identifying and segmenting all the individual elements such as tables, graphs, inline/isolated equations, paragraphs, etc. The problem of identifying and classifying the elements of a digital document based on its visual structure can be grouped into three categories [6]: (i) foreground regions, (ii) background regions, and (iii) both foreground and background regions. Foreground-based approaches perform page segmentation by analyzing the foreground pixels, which normally are the text characters. The approaches that analyze both foreground and background pixels try to ensemble the results of both individual approaches

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Learning in Time-Frequency Domain for Document Layout Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

(DCNN) Deep Convolution Neural Network Classifier and (EW-CSA) Earth Worm-Crow Search Algorithm for Lane Detection
-
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
--
10 Nov 2019
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

Handwritten Kannada numerals recognition using deep learning convolution neural network (DCNN) classifier
Vishweshwrayya C Hallur ... R S Hegadi
CSI Transactions on ICT | VOL. 8
Vishweshwrayya C Hallur, et. al.Vishweshwrayya C Hallur ... R S Hegadi
11 May 2020
CSI Transactions on ICT | VOL. 8

A Two Fold Approach for Object Recognition with Bag of Visual Words using Artificial Neural Network
Muhammad Ahmed Raza ... Qasim Hussain
-
Muhammad Ahmed Raza, et. al.Muhammad Ahmed Raza ... Qasim Hussain
05 Nov 2020
05 Nov 2020

Deep convolutional neural network classifier for travel patterns using binary sensors
Munkhjargal Gochoo ... Vijayalakshmi Velusamy
-
Munkhjargal Gochoo, et. al.Munkhjargal Gochoo ... Vijayalakshmi Velusamy
01 Nov 2017
01 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning in Time-Frequency Domain for Document Layout Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access