Abstract

MotivationIn the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse.ResultsWe developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE

Highlights

  • There is a large number of available methods for the prioritization or prediction of gene–disease associations (Natarajan and Dhillon, 2014; Wang et al, 2011; Zhou and Skolnick, 2016)

  • Using the guilt-by-association approach relies on prior knowledge of a set of genes associated with a disease D and a relatedness measure that compares genes with the set of genes associated with D; if a gene is strongly related with respect to the relatedness measure it is suggested as a novel candidate gene

  • We use gene-to-phenotype associations observed in mutant mouse models, downloaded from the Mouse Genome Informatics (MGI) database (Blake et al, 2014) on 8 Jun 2018, and gene-to-phenotype associations derived from gene–disease associations and provided by the Human Phenotype Ontology (HPO) database, downloaded on 8 Jun 2018

Read more

Summary

Introduction

There is a large number of available methods for the prioritization or prediction of gene–disease associations (Natarajan and Dhillon, 2014; Wang et al, 2011; Zhou and Skolnick, 2016). Computational methods that predict gene–disease associations use a large number of different features and approaches. Several approaches to the computational prediction of gene–disease associations are based on the guilt-by-association principle (Gillis and Pavlidis, 2012). Using the guilt-by-association approach relies on prior knowledge of a set of genes associated with a disease D and a relatedness measure that compares genes with the set of genes associated with D; if a gene is strongly related with respect to the relatedness measure it is suggested as a novel candidate gene. As guilt-by-association relies on prior knowledge of disease-associated genes, they cannot be applied to monogenic diseases, and their applications are, in general, limited to few diseases Several measures are used to determine relatedness between genes, with the most prominent ones relying on network associations (Aerts, 2006; Kohler et al, 2008; Lee et al, 2011) or some form of functional or phenotypic similarity (Schlicker and Albrecht, 2008).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call