Abstract

MotivationOver the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling.ResultsWe use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein–protein interactions and gene–disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies.Availability and implementation https://github.com/bio-ontology-research-group/tsoe.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Biomedical ontologies are widely used to formally represent the classes and relations within a domain and to provide a structured, controlled vocabulary for the annotations of biological entities (Smith et al, 2007)

  • While axioms are mainly exploited through automated tools and methods, ontologies contain labels, synonyms and definitions (Hoehndorf et al, 2015b); improving the human-accessible components of ontologies has been a major focus of ontology development (Kohler et al, 2006); for example, including ‘good’ natural language definitions and adequate labels is a requirement for biomedical ontologies in the Open Biomedical Ontologies Foundry (Smith et al, 2007), an initiative to collaboratively develop a set of reference ontologies in the biomedical domains

  • We do not alter the biological data used for training and evaluation but only alter the background knowledge encoded in ontologies, using a set of datadriven methods that can encode entities with their ontology-based annotations, together with the ontologies and their axioms, within vector spaces

Read more

Summary

Introduction

Biomedical ontologies are widely used to formally represent the classes and relations within a domain and to provide a structured, controlled vocabulary for the annotations of biological entities (Smith et al, 2007). Significant efforts have been made to enrich ontologies by incorporating formalized background knowledge as well as meta-data that improve accessibility and utility of the ontologies (Mungall et al, 2011; Smith et al, 2007). The amount of information contained in ontologies, and the rigour with which this information has been created, verified and represented, may improve domain-specific data analysis through the provision of background knowledge (Garcez and Lamb, 2004).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call