An effective method for figures and tables detection in academic literature

Fengchang Yu,Jiani Huang,Zhuoran Luo,Li Zhang,Wei Lu

doi:10.1016/j.ipm.2023.103286

Abstract

Figures and tables in scientific articles serve as data sources for various academic data mining tasks. These tasks require input data to be in its entirety. However, existing studies measure the performance of algorithms using the same IoU (Intersection over Union) or IoU-based metrics that are used for natural situations. There is a gap between high IoU and detection entirety in scientific figures and tables detection tasks. In this paper, we demonstrate the existence of this gap and suggest that the leading cause is the detection error in the boundary area. We propose an effective detection method that cascades semantic segmentation and contour detection. The semantic segmentation model adopted a novel loss function to enhance the weights of boundary parts and a categorized dice metric to evaluate the imbalanced pixels in the segmentation result. Under rigorous testing criteria, the method proposed in this paper yielded a page-level F1 of 0.983 exceeding state-of-the-art academic figure and table detection methods. The research results in this paper can significantly improve the data quality and reduce data cleaning costs for downstream applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An effective method for figures and tables detection in academic literature

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Feb 1, 2023
Citations: 4

Similar Papers

Semantic Segmentation Model for Road Scene Based on Encoder-Decoder Structure
Yuanzhe Peng ... Weichao Han
-
Yuanzhe Peng, et. al.Yuanzhe Peng ... Weichao Han
01 Dec 2019
01 Dec 2019

A comparative study of pre-trained convolutional neural networks for semantic segmentation of breast tumors in ultrasound
Wilfrido Gómez-Flores ... Wagner Coelho De Albuquerque Pereira
Computers in Biology and Medicine | VOL. 126
Wilfrido Gómez-Flores, et. al.Wilfrido Gómez-Flores ... Wagner Coelho De Albuquerque Pereira
08 Oct 2020
Computers in Biology and Medicine | VOL. 126

Semantic segmentation using deep learning to extract total extraocular muscles and optic nerve from orbital computed tomography images
Fubao Zhu ... Weihua Zhou
Optik | VOL. 244
Fubao Zhu, et. al.Fubao Zhu ... Weihua Zhou
02 Jul 2021
Optik | VOL. 244

Semantic segmentation: A modern approach for identifying soil clods in precision farming
Afshin Azizi ... Hamid Abrishami Moghaddam
Biosystems Engineering | VOL. 196
Afshin Azizi, et. al.Afshin Azizi ... Hamid Abrishami Moghaddam
30 Jun 2020
Biosystems Engineering | VOL. 196

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An effective method for figures and tables detection in academic literature

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management