Learning gene networks underlying clinical phenotypes using SNP perturbation.

Calvin Mccarter,Judie Howrylak,Seyoung Kim

doi:10.1371/journal.pcbi.1007940

Abstract

Availability of genome sequence, molecular, and clinical phenotype data for large patient cohorts generated by recent technological advances provides an opportunity to dissect the genetic architecture of complex diseases at system level. However, previous analyses of such data have largely focused on the co-localization of SNPs associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus mapping. Thus, their description of the molecular mechanisms behind the SNPs influencing clinical phenotypes was limited to the single gene linked to the co-localized SNP. Here we introduce PerturbNet, a statistical framework for learning gene networks that modulate the influence of genetic variants on phenotypes, using genetic variants as naturally occurring perturbation of a biological system. PerturbNet uses a probabilistic graphical model to directly model the cascade of perturbation from genetic variants to the gene network to the phenotype network along with the networks at each layer of the biological system. PerturbNet learns the entire model by solving a single optimization problem with an efficient algorithm that can analyze human genome-wide data within a few hours. PerturbNet inference procedures extract a detailed description of how the gene network modulates the genetic effects on phenotypes. Using simulated and asthma data, we demonstrate that PerturbNet improves statistical power for detecting disease-linked SNPs and identifies gene networks and network modules mediating the SNP effects on traits, providing deeper insights into the underlying molecular mechanisms.

Highlights

One of the key questions in biology is how genetic variants perturb gene regulatory systems to influence disease susceptibility in population
We describe PerturbNet, a statistical framework for learning a gene network that modulates the influence of genetic variants on phenotypes, using genetic variants as naturally occurring perturbation of a biological system
PerturbNet directly models the cascade of perturbation from genetic variants to the gene network to the phenotype network, integrating the existing computational tools for expression quantitative trait locus (eQTL) mapping, genome-wide association study (GWAS), co-localization analysis of eQTL and GWAS variants, and gene network discovery under single nucleotide polymorphisms (SNPs) perturbation within a single statistical framework

Summary

Introduction

One of the key questions in biology is how genetic variants perturb gene regulatory systems to influence disease susceptibility in population. The existing computational methods for integrating eQTL and clinical phenotype data lack the ability to learn a gene network as a mediator that receives SNP perturbation and passes the perturbation effects onto clinical traits. Many of these methods have disregarded gene networks entirely, focusing on the co-localization of an eQTL and a trait-associated SNP [10,11,12,13], each of which was identified in a separate eQTL mapping [7, 14,15,16] and a genome-wide association study (GWAS) [17, 18]. Bayesian networks have been used to learn gene regulatory networks and to form a predictive model for diseases with this network [24]; this approach relied on an elaborate pipeline of data analysis to identify disease-related gene modules and genetic variants, leading to loss of statistical power

Methods

Results

Discussion

Conclusion