Abstract

A typical use case of ontologies is the calculation of similarity scores between items that are annotated with classes of the ontology. For example, in differential diagnostics and disease gene prioritization, the human phenotype ontology (HPO) is often used to compare a query phenotype profile against gold-standard phenotype profiles of diseases or genes. The latter have long been constructed as flat lists of ontology classes, which, as we show in this work, can be improved by exploiting existing structure and information in annotation datasets or full text disease descriptions. We derive a study-wise annotation model of diseases and genes and show that this can improve the performance of semantic similarity measures. Inferred weights of individual annotations are one reason for this improvement, but more importantly using the study-wise structure further boosts the results of the algorithms according to precision-recall analyses. We test the study-wise annotation model for diseases annotated with classes from the HPO and for genes annotated with gene ontology (GO) classes. We incorporate this annotation model into similarity algorithms and show how this leads to improved performance. This work adds weight to the need for enhancing simple list-based representations of disease or gene annotations. We show how study-wise annotations can be automatically derived from full text summaries of disease descriptions and from the annotation data provided by the GO Consortium and how semantic similarity measure can utilize this extended annotation model. Database URL: https://phenomics.github.io/

Highlights

  • Ontologies have become a widely used tool to capture knowledge about objects in biology, genomics and medicine

  • We show how study-wise annotations can be automatically derived from full text summaries of disease descriptions and from the annotation data provided by the gene ontology (GO) Consortium and how semantic similarity measure can utilize this extended annotation model

  • Multiple studies are underlying such an annotation set and we have inferred a study-wise annotation model for disease being annotated with classes of the human phenotype ontology (HPO) and for genes being annotated with classes of the GO-biological process (BP)

Read more

Summary

Introduction

Ontologies have become a widely used tool to capture knowledge about objects in biology, genomics and medicine. Besides enabling knowledge integration and retrieval, they are a widely used tool for similarity calculation between items that have been described (annotated) with classes of an ontology [1]. Reliable ontology-based similarity measures are important, as they form the basis of several applications for differential diagnostics [2], disease gene finding [3], gene function prediction [4] and many more. Ontology-based similarity measures allow non-perfect matches between ontology-classes to be quantified by incorporating the graph.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call