Abstract

Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.

Highlights

  • Gene expression profiles measured by microarrays are subject to variations caused by biological and technical effects

  • Taken together with the results from hierarchical clustering, these findings clearly show maintained biological variability of overall gene expression after batch effect removal through quantile normalization followed by ComBat

  • The combination of quantile normalization and ComBat in large-scale, longitudinal gene expression data is the best approach for removal of batch effects in our study dataset

Read more

Summary

Introduction

Gene expression profiles measured by microarrays are subject to variations caused by biological and technical effects. Systematic differences resulting from biological conditions are of interest, whereas technical variation should be minimal. RNA quality and sample storage time influence overall variation of transcriptomes [2]. Caused by technical limitations for the number of samples that can be processed at once, this is impossible when large sample sets are processed. RNA quality, sample storage time and plate layout are important additional technical factors influencing the association analysis of gene expression data and common disease risk factors [2]. Batch effects cannot be avoided in studies comprising a large number of subjects, and removal of these effects is necessary for reliable differential expression analysis

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call