Predicting candidate genes from phenotypes, functions and anatomical site of expression.

Jun Chen,Azza Althagafi,Robert Hoehndorf,Robinson Peter

doi:10.1093/bioinformatics/btaa879

Abstract

MotivationOver the past years, many computational methods have been developed to incorporate information about phenotypes for disease–gene prioritization task. These methods generally compute the similarity between a patient’s phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms, such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine-learning models.ResultsWe developed a novel graph-based machine-learning method for biomedical ontologies, which is able to exploit axioms in ontologies and other graph-structured data. Using our machine-learning method, we embed genes based on their associated phenotypes, functions of the gene products and anatomical location of gene expression. We then develop a machine-learning model to predict gene–disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state-of-the-art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes, which are associated with phenotypes, functions or site of expression.Availability and implementationSoftware and data are available at https://github.com/bio-ontology-research-group/DL2Vec.Supplementary informationSupplementary data are available at Bioinformatics online.

Highlights

Understanding the molecular mechanisms underlying a set of abnormal phenotypes is important for diagnosis, prevention, and development of therapies
Disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state of the art methods
We hypothesize that by incorporating these indirect associations will allow us to better utilize the background knowledge contained in the ontologies and further improve predictive performance, and we develop a novel embedding approach for ontologies that aims to improve the embedding of ontologies with many complex axioms, as well as embeddings of entities which are annotated with classes that do not stand in a subclass relation but are related through more complex axioms

Summary

Introduction

Understanding the molecular mechanisms underlying a set of abnormal phenotypes is important for diagnosis, prevention, and development of therapies. Several computational methods have been developed to prioritize candidate genes for a particular disease or set of abnormal phenotypes (Tranchevent et al, 2016; Tomar et al, 2019; Guala and Sonnhammer, 2017; Zhang et al, 2018; Feng, 2017). Many such methods rely on identifying similarities between genes and suggest new candidates based on such a similarity (Gillis and Pavlidis, 2012). This similarity can be computed on several known features about a gene, including phenotype associations (Greene et al, 2016), distance within an interaction network (Peng et al, 2018), or functional similarity (Liu et al, 2018; Schlicker and Albrecht, 2010)

Methods

Results

Discussion

Conclusion