A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Bui Hai Phong,Thang Manh Hoang,Thi-Lan Le

doi:10.1109/access.2020.2992067

Bui Hai Phong, Thang Manh Hoang + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2992067

Copy DOI

Abstract

Mathematical expressions have been widely used in scientific documents. In order to analyze the documents, automatic detection of mathematical expressions is a crucial step. The paper presents a unified system for the detection of mathematical expressions including both inline and isolated expressions in scientific document images that usually consist of heterogeneous components (e.g., figures, tables, text and expressions). In the system, a hybrid method of two stages is proposed for the effective detection of mathematical expressions. First, the layout analysis of entire document images is introduced to improve the accuracy of text line and word segmentation. Then, both isolated and inline expressions in document images are detected. Both hand-crafted and deep learning features are extensively investigated and combined to improve the detection accuracy. Furthermore, a generic performance metric is applied to evaluate the system comprehensively. The proposed method has been evaluated on two public benchmark datasets (Marmot and GTDB). The obtained accuracies of isolated and inline expressions in the Marmot dataset are 91.18% and 81.35% while those in the GTDB dataset are 89.51% and 80.20%, respectively. The performance comparison is carried out with the conventional methods to show the outstanding effectiveness of the proposed system. Moreover, extensive experiments have been performed in order to point out the effect of document image resolution and post processing techniques on mathematical expression detection.

Highlights

Mathematical expressions have widely used in scientific documents and an huge number of scientific documents have been produced over years
DATASET In the section, two public datasets that have been used for performance evaluation of mathematical expression detection are described
The improvements in the page segmentation and the classification of mathematical expressions and texts are combined to improve the performance of the overall detection system

Summary

INTRODUCTION

Mathematical expressions have widely used in scientific documents and an huge number of scientific documents have been produced over years. The accurate segmentation of the text lines and words allows to obtain high accuracy of the detection of mathematical expressions. The analysis is carried out to evaluate the impact of the results of page segmentation to the accuracy of mathematical expression detection. (2) A hybrid method that combines both hand-crafted and deep learning features is proposed to improve the accuracy of the detection of mathematical expressions. Fast Fourier Transform (FFT) magnitude and phase are used as features for isolated expression and normal text line classification while the parameters of Gaussian distribution of peaks and valleys of both vertical and horizontal projection profiles of word images are used for inline expression and textual work classification.

RELATED WORK

EVALUATION METRIC

CONCLUSION AND FUTURE WORKS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

New Words Discovery Method Based On Word Segmentation Result
Heyang Liu ... Pengdong Gao
-
Heyang Liu, et. al.Heyang Liu ... Pengdong Gao
01 Jun 2018
01 Jun 2018

A comparison of deep learning and hand crafted features in medical image modality classification
Sameer Khan ... Suet-Peng Yong
-
Sameer Khan, et. al.Sameer Khan ... Suet-Peng Yong
01 Aug 2016
01 Aug 2016

Improving Deep Learning Feature with Facial Texture Feature for Face Recognition
Yunfei Li ... Zhaoyang Lu
Wireless Personal Communications | VOL. 103
Yunfei Li, et. al.Yunfei Li ... Zhaoyang Lu
09 Feb 2018
Wireless Personal Communications | VOL. 103

Revised DBLC Model for Chinese Word Segmentation
Ziyu Liu ... Hehe Yang
-
Ziyu Liu, et. al.Ziyu Liu ... Hehe Yang
12 Jul 2019
12 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access