Investigating Models for the Transcription of Mathematical Formulas in Images

Christian Feichter,Tim Schlippe

doi:10.3390/app14031140

Abstract

The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, Formula2Text, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model LaTeX-OCR and the NLP model BART-Base, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, BART-Base, T5-Base, and FLAN-T5-Base even outperformed ChatGPT, GPT-3.5 Turbo, and GPT-4. For (2), the best VL model, TrOCR, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jan 29, 2024
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Investigating Models for the Transcription of Mathematical Formulas in Images

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Identification of muscle-invasion status in bladder cancer patients using natural language processing and machine learning.
Ruixin Yang ... Di Zhu
Journal of Clinical Oncology | VOL. 40
Ruixin Yang, et. al.Ruixin Yang ... Di Zhu
20 Feb 2022
Journal of Clinical Oncology | VOL. 40

POS0262 IDENTIFYING EROSIVE DISEASE FROM RADIOLOGY REPORTS OF VETERANS WITH INFLAMMATORY ARTHRITIS USING NATURAL LANGUAGE PROCESSING
G Penmetsa ... S Pei
Annals of the Rheumatic Diseases | VOL. 80
G Penmetsa, et. al.G Penmetsa ... S Pei
19 May 2021
Annals of the Rheumatic Diseases | VOL. 80

Leveraging natural language processing models to automate speech-intelligibility scoring
Björn Herrmann
Speech, Language and Hearing | VOL. ahead-of-print
Björn HerrmannBjörn Herrmann
09 Jul 2024
Speech, Language and Hearing | VOL. ahead-of-print

Understanding older people's voice interactions with smart voice assistants: a new modified rule-based natural language processing model with human input.
Zhengxu Yan ... Julie Blaskewicz Boron
Frontiers in digital health | VOL. 6
Zhengxu Yan, et. al.Zhengxu Yan ... Julie Blaskewicz Boron
01 Jan 2024
Frontiers in digital health | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating Models for the Transcription of Mathematical Formulas in Images

Abstract

Talk to us

Similar Papers

More From: Applied Sciences