Abstract

SummaryWe present an accessible, fast, and customizable network propagation system for pathway boosting and interpretation of genome-wide association studies. This system—NAGA (Network Assisted Genomic Association)—taps the NDEx biological network resource to gain access to thousands of protein networks and select those most relevant and performative for a specific association study. The method works efficiently, completing genome-wide analysis in under 5 minutes on a modern laptop computer. We show that NAGA recovers many known disease genes from analysis of schizophrenia genetic data, and it substantially boosts associations with previously unappreciated genes such as amyloid beta precursor. On this and seven other gene-disease association tasks, NAGA outperforms conventional approaches in recovery of known disease genes and replicability of results. Protein interactions associated with disease are visualized and annotated in Cytoscape, which, in addition to standard programmatic interfaces, allows for downstream analysis.

Highlights

  • While genome-wide association studies (GWAS) have linked many genetic variants to complex diseases, the variants mapped far account for only a small fraction of the total genetic variation affecting any given disease phenotype (Sullivan et al, 2018)

  • Our method starts with summary p values assigned by PLINK (Chang et al, 2015), SNPTEST (Marchini et al, 2007), or another standard GWAS analysis approach

  • The technique of network propagation is performed to spread the gene scores to network neighbors, resulting in revised scores that are used to prioritize all genes in a final ranked list. This ranked gene list may be validated and explored using a variety of means, including comparison to a gold-standard set of genes to establish that Network Assisted Genomic Association (NAGA) has enriched for biological processes of interest

Read more

Summary

Introduction

While genome-wide association studies (GWAS) have linked many genetic variants to complex diseases, the variants mapped far account for only a small fraction of the total genetic variation affecting any given disease phenotype (Sullivan et al, 2018). A common challenge with these studies is that they typically test millions of single nucleotide polymorphisms (SNPs) for disease association, making it difficult to distinguish the causal loci from the background statistical noise of other variants This situation leads to the use of very stringent significance thresholds to identify associated variants (e.g., p value < 5 3 10À8), with the consequence that all but the strongest findings may be missed (Lander and Kruglyak, 1995). Integration of GWAS studies with protein-protein interactions (PPIs) and other types of molecular networks has recently gained attention as an approach to help overcome the lack of statistical power in the detection of gene-disease associations (Jia and Zhao, 2014).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call