Abstract

Formulas exist in various kinds of documents with different formats. Extracting and normalizing them into a unique form are the precondition of mathematical retrieval. In this paper, an extraction and conversion method of formulas in Word documents is built for mathematical expression retrieval. Firstly, the mathematical expressions in Word documents are detected through the processing of OLE objects. Then, the matching rules of formula format conversion are defined. Finally, the extracted mathematical expressions in OMML format are converted into LaTeX format follow the defined rules and stored in a txt file. Furthermore, the formulas exist in MathType format are stored in bitmap documents and converted into LaTeX documents through formula recognition and reconstruction module. Experiments show the effectiveness of the designed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call