Abstract

When retrieving scientific documents with mathematical expressions as the main content, both mathematical expressions and their contextual text features require consideration. However, mathematical expressions are different from texts in terms of grammar and semantics. Thus, integrating the above features and realizing scientific document retrieval is difficult. In this study, a retrieval method of scientific documents based on HFS (Hesitation Fuzzy Sets) and BERT (Bidirectional Encoder Representations from Transformer) is proposed. This method is realized through utilizing the advantages of HFS in multi-attribute decision making and BERT in context-dependent similarity calculation. By analyzing mathematical expressions and calculating the membership degree of symbolic multi-attributes, the similarity of mathematical expressions can be obtained, which can improve the accuracy of mathematical expression recall. With the extraction of the text of the expression context, BERT is used to calculate the context similarity. Then, the recalled technical documents are sorted according to the similarity of context, and the final retrieval result can be obtained. Experiments were carried out on 10,372 Chinese and 11,770 English scientific documents in the NTCIR extended data set. The average value of MAP_ <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$k (k=10)$ </tex-math></inline-formula> for the recall results of scientific documents was 74.13%. The average <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> DCG ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n=10$ </tex-math></inline-formula> ) for the ranking of scientific documents was 86.04%.

Highlights

  • Scientific documents carry important information about scientific research and technological development

  • In terms of optimizing the results of sorting scientific documents based on mathematical expressions, Hussain and Khoja [8] developed a retrieval method based on the semantic information of mathematical expressions

  • The BERT model is used to calculate the similarity between the text corresponding to the Expression recalled in the Mathematical Expression Similarity module and the query text

Read more

Summary

INTRODUCTION

Scientific documents carry important information about scientific research and technological development. In terms of optimizing the results of sorting scientific documents based on mathematical expressions, Hussain and Khoja [8] developed a retrieval method based on the semantic information of mathematical expressions. There still exist some problems in integrating mathematical expression and text information in scientific document retrieval. A scientific document retrieval method based on HFS (Hesitation Fuzzy Sets) [15] and BERT (Bidirectional Encoder Representations from Transformer) [16] is proposed This system integrates mathematical expressions and their contextual text features to improve the retrieval accuracy. The BERT model is used to calculate the similarity between the text corresponding to the Expression recalled in the Mathematical Expression Similarity module and the query text. Where expn refers to the n-th expression and selistsn refers to the list of sentences corresponding to the n-th expression

MATHEMATICAL EXPRESSION SIMILARITY
EXPRESSION ATTRIBUTE DECOMPOSITION
KEYWORDS EXTRACTION IN THE CONTEXT OF
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call