Abstract

BackgroundIllumina DNA methylation arrays are high-throughput platforms for cost-effective genome-wide profiling of individual CpGs. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful “preprocessing” of raw data.MethodsHere we describe the ENmix preprocessing pipeline and compare it to a set of seven published alternative pipelines (ChAMP, Illumina, SWAN, Funnorm, Noob, wateRmelon, and RnBeads). We use two large sets of duplicate sample measurements with 450 K and EPIC arrays, along with mixtures of isogenic methylated and unmethylated cell line DNA to compare raw data and that preprocessed via different pipelines.ResultsOur evaluations show that the ENmix pipeline performs the best with significantly higher correlation and lower absolute difference between duplicate pairs, higher intraclass correlation coefficients (ICC) and smaller deviations from expected methylation level in mixture experiments. In addition to the pipeline function, ENmix software provides an integrated set of functions for reading in raw data files from mouse and human arrays, quality control, data preprocessing, visualization, detection of differentially methylated regions (DMRs), estimation of cell type proportions, and calculation of methylation age clocks. ENmix is computationally efficient, flexible and allows parallel computing. To facilitate further evaluations, we make all datasets and evaluation code publicly available.ConclusionCareful selection of robust data preprocessing methods is critical for DNA methylation array studies. ENmix outperformed other pipelines in our evaluations to minimize experimental variation and to improve data quality and study power.

Highlights

  • Illumina Infinium Methylation BeadChip are being widely utilized to measure individual CpG methylation on an epigenome-wide scale

  • In addition to the preprocessing pipeline function, the Exponential–normal mixture model (ENmix) R software provides a set of functions to facilitate large-scale epigenetic analyses including direct import of IDAT files and Illumina manifest files, quality control measures, imputation, surrogate variable analysis for batch effects using internal control probes, intraclass correlation coefficients (ICC) calculation, epigenetic clocks, differential methylated region (DMR) analysis, and estimation of blood cell proportions

  • Evaluation results We applied each of the preprocessing pipelines listed in the methods to the technical duplicate datasets using each pipeline’s recommended default parameter values to evaluate how concordance between duplicates were improved (See evaluation R code in the Additional file 1)

Read more

Summary

Introduction

Illumina Infinium Methylation BeadChip are being widely utilized to measure individual CpG methylation on an epigenome-wide scale. We describe the combination of these methods into the ENmix preprocessing pipeline, named after our original background correction method, and describe features of the extended ENmix methylation analysis software. It is difficult for even experienced investigators to select from among diverse methods and implement them in their own array analysis. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful “preprocessing” of raw data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call