Abstract
SummaryThe mathematical formula is one of the most vital components in a scientific document, which can explicitly describe various complex concepts and ideas. In addition to numerical calculations, they are also used to clarify definitions and disambiguate explanations transcribed in natural language. Nevertheless, the formulas have a noteworthy impact in the scientific documents, the existing information retrieval systems have limited access to scientific documents based on formulas‐based queries. To accomplish this, in this research, we have studied and implemented the formula embedding approach, which encodes the formula into the embedded vector. For encoding the formula, we have used pretrained sentence bidirectional encoder representations from transformers model. The proposed embedding model takes the latex formula as input and generates an upshot as a fixed dimensional embedding representation. In addition to this, the Siamese network is used to reform the semantic meaning of the formulas. Furthermore, the embedding of the formulas and the queried formula are compared, and cosine similarity is estimated. The performance of the suggested methodology is verified using a math stack exchange corpus of ARQMath 2020, and obtained results have shown a remarkable contribution in the task of formula retrieval.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have