Abstract

Adjustment for confounding sources of expression variation is an important preprocessing step in large gene expression studies, but the effect of confound adjustment on co-expression network analysis has not been well-characterized. Here, we demonstrate that the choice of confound adjustment method can have a considerable effect on the architecture of the resulting co-expression network. We compare standard and alternative confound adjustment methods and provide recommendations for their use in the construction of gene co-expression networks from bulk tissue RNA-seq datasets.

Highlights

  • Large-scale gene expression studies are often subject to technical and biological sources of expression variation including effects of batch, sample characteristics, and environmental factors

  • RUVCorr, CONFETI, and principal component adjustment (PC) adjustment are three alternative data correction approaches designed to identify and remove hidden confounds while retaining patterns of co-expression in the dataset. We compare these approaches to one popular standard method of hidden confound adjustment (PEER), known covariate adjustment, and a baseline uncorrected dataset

  • We identified co-expression modules using three module detection methods (Additional Files 1, 2, 3): weighted gene correlation network analysis (WGCNA), multiscale embedded gene co-expression network analysis (MEGENA), and independent component analysis (ICA) [18–20]

Read more

Summary

Introduction

Large-scale gene expression studies are often subject to technical and biological sources of expression variation including effects of batch, sample characteristics, and environmental factors. Confounding factors can be documented sources of expression variation (known covariates), or derived empirically from the expression dataset (hidden covariates), and adjusting for these factors has become common practice in many population-level gene expression studies. While the benefits of confounding factor correction have been well-characterized in analyses of differential expression and expression quantitative trait locus mapping [2–5], the effects of confounding factor correction on studies of gene co-expression are less well understood ( see [6, 7]). This is because confounding factors are difficult to distinguish from gene coexpression, as both variables induce patterns of correlation between genes. Because distinguishing regulatory effects from artifacts is difficult, researchers have historically performed no data correction or known covariate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call