This study presents a comprehensive framework to enhance Wikidata as an open and collaborative knowledge graph by integrating Open Biological and Biomedical Ontologies (OBO) and Medical Subject Headings (MeSH) keywords from PubMed publications. The primary data sources include OBO ontologies and MeSH keywords, which were collected and classified using SPARQL queries for RDF knowledge graphs. The semantic alignment between OBO ontologies and Wikidata was evaluated, revealing significant gaps and distorted representations that necessitate both automated and manual interventions for improvement. We employed pointwise mutual information to extract biomedical relations among the 5000 most common MeSH keywords in PubMed, achieving an accuracy of 89.40 % for superclass-based classification and 75.32 % for relation type-based classification. Additionally, Integrated Gradients were utilized to refine the classification by removing irrelevant MeSH qualifiers, enhancing overall efficiency. The framework also explored the use of MeSH keywords to identify PubMed reviews supporting unsupported Wikidata relations, finding that 45.8 % of these relations were not present in PubMed, indicating potential inconsistencies in Wikidata. The contributions of this study include improved methodologies for enriching Wikidata with biomedical information, validated semantic alignments, and efficient classification processes. This work enhances the interoperability and multilingual capabilities of biomedical ontologies and demonstrates the critical role of MeSH keywords in verifying semantic relations, thereby contributing to the robustness and accuracy of collaborative biomedical knowledge graphs.
Read full abstract