Abstract

Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.

Highlights

  • Large-scale efforts such as the 1000 Genomes Project (1000 Genomes Project Consortium et al, 2015), The Cancer Genome Atlas (TCGA) (TCGA Research Network, n.d.), and the Genome Aggregation Database (Karczewski et al, 2019), among others, have cataloged millions of variants occurring in tens of thousands of healthy and disease genomes

  • To showcase uKIN’s versatility, we show its effectiveness in identifying causal genes for three other complex diseases, where the genes known to be associated with the disease come from the Online Mendelian Inheritance in Man (OMIM) (OMIM, 2000) and genes comprising the new information arise from GWAS

  • We assume that prior knowledge about a disease is given by a set of genes already implicated as causal for that disease, and new information consists of genes that are potentially disease relevant

Read more

Summary

Introduction

Large-scale efforts such as the 1000 Genomes Project (1000 Genomes Project Consortium et al, 2015), The Cancer Genome Atlas (TCGA) (TCGA Research Network, n.d.), and the Genome Aggregation Database (Karczewski et al, 2019), among others, have cataloged millions of variants occurring in tens of thousands of healthy and disease genomes. Despite this abundance of genomic data, understanding the genetic basis underlying complex human diseases remains challenging (Kim and Przytycka, 2012). The signal from known disease genes can be ‘‘propagated’’ across a network to prioritize either all genes within the network or just candidate genes within a genomic locus where single nucleotide polymorphisms have been correlated with an increased susceptibility to disease (Chen et al, 2009; Erten et al, 2011; Kohler et al, 2008; Lundby et al, 2014; Navlakha and Kingsford, 2010; Smedley et al, 2014; Vanunu et al, 2010)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call