Abstract

BackgroundBiomedical ontologies pose several challenges to ontology matching due both to the complexity of the biomedical domain and to the characteristics of the ontologies themselves. The biomedical tracks in the Ontology Matching Evaluation Initiative (OAEI) have spurred the development of matching systems able to tackle these challenges, and benchmarked their general performance. In this study, we dissect the strategies employed by matching systems to tackle the challenges of matching biomedical ontologies and gauge the impact of the challenges themselves on matching performance, using the AgreementMakerLight (AML) system as the platform for this study.ResultsWe demonstrate that the linear complexity of the hash-based searching strategy implemented by most state-of-the-art ontology matching systems is essential for matching large biomedical ontologies efficiently. We show that accounting for all lexical annotations (e.g., labels and synonyms) in biomedical ontologies leads to a substantial improvement in F-measure over using only the primary name, and that accounting for the reliability of different types of annotations generally also leads to a marked improvement. Finally, we show that cross-references are a reliable source of information and that, when using biomedical ontologies as background knowledge, it is generally more reliable to use them as mediators than to perform lexical expansion.ConclusionsWe anticipate that translating traditional matching algorithms to the hash-based searching paradigm will be a critical direction for the future development of the field. Improving the evaluation carried out in the biomedical tracks of the OAEI will also be important, as without proper reference alignments there is only so much that can be ascertained about matching systems or strategies. Nevertheless, it is clear that, to tackle the various challenges posed by biomedical ontologies, ontology matching systems must be able to efficiently combine multiple strategies into a mature matching approach.

Highlights

  • Biomedical ontologies pose several challenges to ontology matching due both to the complexity of the biomedical domain and to the characteristics of the ontologies themselves

  • We use AgreementMakerLight (AML) [11] as a platform for the study, as it meets three critical criteria: it is one of the top performing systems in the biomedical tracks of the Ontology Matching Evaluation Initiative (OAEI) [12] and represents the state of the art; it was designed for matching biomedical ontologies and to tackle most of the challenges involved therein; and it has a modular architecture, which is essential to enable the type of analysis we aim to conduct in this study

  • The rest of the manuscript is organized as follows: in the “Related work” section we review how matching systems participating in the OAEI have tackled the challenges of matching biomedical ontologies; in the Methods, we provide a brief overview of AML, make an in-depth analysis of the strategies AML and other top-performing matching systems employ to tackle biomedical ontologies, and describe the datasets and experimental setting; in the Results and Discussion we dissect the impact of several of the strategies implemented by AML on its effectiveness and efficiency; and in the Conclusions, we provide an overarching view of the study and ponder on the aspects where the state of the art in matching biomedical ontologies can be improved

Read more

Summary

Introduction

Biomedical ontologies pose several challenges to ontology matching due both to the complexity of the biomedical domain and to the characteristics of the ontologies themselves. Many of the most widely used biomedical ontologies have tens of thousands of classes (e.g., the Gene Ontology, the Uber Anatomy Ontology) or even hundreds of thousands (e.g., the SNOMED Clinical Terms, the Chemical Entities of Biological Interest Ontology) Handling such large ontologies presents computational challenges throughout the ontology matching pipeline. In addition to the traditional use of background knowledge ontologies as mediators, AML can use them for lexical expansion, i.e., to generate new synonyms in the input ontologies. We evaluated AML’s full matching pipeline with the background knowledge matching component modified appropriately to cover all four combinations of these two factors We carried out this evaluation on the Anatomy and FMA-NCI small tasks, as these are the only tasks in which the coverage of the available crossreferences from UBERON is comparable to its lexical coverage, and for which comparing the two information sources would be fair. If the intended scope of our alignment were human health, we would likely be better off using lexical information, but if it were broader, the UBERON cross-references would be best

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call