Abstract

AbstractDomain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.KeywordsKnowledge graph constructionScientific papersMathematical modelsFormulaeVariables

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.