Abstract

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.

Highlights

  • Recovering gene co-expression networks from high-throughput experiments to measure gene expression levels is essential for understanding the genetic regulation of complex traits

  • We focused on building differential gene co-expression networks across estrogen receptor status (ER) positive (ER+) and ER negative (ER-) patients because of ER’s prognostic value in profiling breast cancer patients [72]: cancer patients that are ER+ are more likely to respond to endocrine therapies than patients that are ER

  • To control for confounding effects, we removed the effects of five principal components of the gene expression matrix separately within each tissue so as to maintain the tissue specific effects for all methods except Fabia and BicMix. We found that both Plaid and Fabia were able to separate the tissues in the sample space, as was principal components analysis (PCA; S9 Fig)

Read more

Summary

Introduction

Functional gene modules consist of subsets of genes that share similar expression patterns and perform coordinated cellular functions [1, 2]. If we consider each gene as a vertex in a network, pairs of genes within a gene module for which the correlation in expression levels cannot be explained by other genes may be connected by an undirected edge. Across all genes, these pairwise relationships constitute gene coexpression networks. Our work describes a rigorous approach to recover undirected gene co-expression networks from gene expression data that uses a probabilistic latent factor model to quantify the relationships between all pairs of genes

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call