Abstract

Achieving scientific document retrieval by considering the wealth of mathematical expressions and the semantic text they contain has become an inescapable trend. Current scientific document matching models focus solely on the textual features of expressions and frequently encounter hurdles like proliferative parameters and sluggish reasoning speeds in the pursuit of improved performance. To solve this problem, this paper proposes a scientific document retrieval method founded upon hesitant fuzzy sets (HFS) and local semantic distillation (LSD). Concretely, in order to extract both spatial and semantic features for each symbol within a mathematical expression, this paper introduces an expression analysis module that leverages HFS to establish feature indices. Secondly, to enhance contextual semantic alignment, the method of knowledge distillation is employed to refine the pretrained language model and establish a twin network for semantic matching. Lastly, by amalgamating mathematical expressions with contextual semantic features, the retrieval results can be made more efficient and rational. Experiments were implemented on the NTCIR dataset and the expanded Chinese dataset. The average MAP for mathematical expression retrieval results was 83.0%, and the average nDCG for sorting scientific documents was 85.8%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call