Design of a mathematical expression understanding system

Hsi-Jian Lee,Jiumn-Shine Wang

doi:10.1016/s0167-8655(97)87048-1

Abstract

A scientific document usually consists of text and mathematical expressions. In this paper, we present a system for segmenting and understanding text and mathematical expressions in a document. The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. After we extract all text lines in a document, we separate all symbols in each text line and calculate direction-feature vectors and aspect ratios for those symbols. Then, a nearest-neighbor algorithm recognizes characters. In the expression formation stage, we build a symbol relation tree for each text line that represents the relationships among the symbols in the text line. Each text line is decomposed into a collection of primitive tokens: operands, operators and separators. Heuristic rules based on these primitive tokens are used to correct text recognition errors. Finally, we extract all mathematical expressions according to basic expression forms. Several pages of documents were scanned to test the method. All mathematical expressions are understood. In the expressions generated, a few symbols are misrecognized. The average recognition rate was 96.16%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Design of a mathematical expression understanding system

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Mar 1, 1997
Citations: 61

Similar Papers

Design of a mathematical expression recognition system
Hsi-Jian Lee ... Jiumn-Shine Wang
-
Hsi-Jian Lee, et. al. Hsi-Jian Lee ... Jiumn-Shine Wang
14 Aug 1995
14 Aug 1995

Segmentation Method for Myanmar Character Recognition Using Block based Pixel Count and Aspect Ratio
Kyi Pyar Zaw ... Zin Mar Kyu
-
Kyi Pyar Zaw, et. al.Kyi Pyar Zaw ... Zin Mar Kyu
28 Oct 2017
28 Oct 2017

Fast document image comparison in multilingual corpus without OCR
Yuping Lin ... Fang Wang
Multimedia Systems | VOL. 23
Yuping Lin, et. al.Yuping Lin ... Fang Wang
08 Oct 2015
Multimedia Systems | VOL. 23

Text Line Identification from a Multilingual Document
P.A Vijaya ... M.C Padma
-
P.A Vijaya, et. al.P.A Vijaya ... M.C Padma
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design of a mathematical expression understanding system

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters