Abstract

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.

Highlights

  • Machine learning methods are applied widely across life sciences to develop predictive models [1]

  • Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database

  • We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models

Read more

Summary

Introduction

Machine learning methods are applied widely across life sciences to develop predictive models [1]. While the vocabulary of O may be large and consist of thousands of class, relation and individual symbols, fe usually embeds these entities in a space of relatively small size (depending on the chosen parameter n); the embedding preserves certain structural characteristics of the ontology O similar to a ‘module’ [83] in the ontology, thereby making this local information available to an optimization algorithm that finds c; and embeddings in Rn allow gradient descent methods to be applied directly which are used in many modern machine learning methods. Traditional semantic similarity measures, in particular Resnik’s measure [53], perform well across many evaluations, in particular in recall at the first ranks, and often has better performance than

Method
Limitations and future work
Findings
Key Points
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.