Abstract

Motivation: In order to create controlled vocabularies for shared use in different biomedical domains, a large number of biomedical ontologies such as Disease Ontology (DO) and Human Phenotype Ontology (HPO), etc., are created in the bioinformatics community. Quantitative measures of the associations among diseases could help researchers gain a deep insight of human diseases, since similar diseases are usually caused by similar molecular origins or have similar phenotypes, which is beneficial to reveal the common attributes of diseases and improve the corresponding diagnoses and treatment plans. Some previous are proposed to measure the disease similarity using a particular biomedical ontology during the past few years, but for a newly discovered disease or a disease with few related genetic information in Disease Ontology (i.e., a disease with less disease-gene associations), these previous approaches usually ignores the joint computation of disease similarity by integrating gene and phenotype associations.Results: In this paper we propose a novel method called GPSim to effectively deduce the semantic similarity of diseases. In particular, GPSim calculates the similarity by jointly utilizing gene, disease and phenotype associations extracted from multiple biomedical ontologies and databases. We also explore the phenotypic factors such as the depth of HPO terms and the number of phenotypic associations that affect the evaluation performance. A final experimental evaluation is carried out to evaluate the performance of GPSim and shows its advantages over previous approaches.

Highlights

  • The emergence of massive biomedical data offers a marvelous opportunity for the life science research and modern disease diagnosis

  • We explore the phenotypic factors including the depth of HPO terms and the number of diseasephenotype associations when each disease has few disease-gene associations and compare GPSim with previous disease similarity measurement methods, including Resnik (Resnik, 1995), Zhang (Zhang et al, 2010), BOG (Mathur and Dinakarpandian, 2012) and SemFunSim(Cheng et al, 2014)

  • receiver operating characteristic curve (ROC) curve is a curve drawn with true positive rate (TPR) as Y axis and false positive rate (FPR) as X axis according to a series of different dichotomies

Read more

Summary

Introduction

The emergence of massive biomedical data offers a marvelous opportunity for the life science research and modern disease diagnosis. In order to create controlled vocabularies for the shared use of knowledge, a large number of biomedical ontologies such as Disease Ontology [DO (Schriml et al, 2012; Kibbe et al, 2014)] and Human Phenotype Ontology [HPO (Köhler et al, 2014)], etc., are created in the bioinformatics community. A Disease Similarity Measurement Approach concepts and make innovative contributions to advance the understanding of human diseases with controllable terminology. These ontologies have been used in a variety of biomedical applications. By using DO, researchers build the chain knowledge base of etiology (Harrow et al, 2017; Kozaki et al, 2017) and annotate human genes to improve the coverage of disease genes’ annotations (Osborne et al, 2009)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call