Abstract
The Illumina HumanMethylation450 BeadChip is increasingly utilized in epigenome-wide association studies, however, this array-based measurement of DNA methylation is subject to measurement variation. Appropriate data preprocessing to remove background noise is important for detecting the small changes that may be associated with disease. We developed a novel background correction method, ENmix, that uses a mixture of exponential and truncated normal distributions to flexibly model signal intensity and uses a truncated normal distribution to model background noise. Depending on data availability, we employ three approaches to estimate background normal distribution parameters using (i) internal chip negative controls, (ii) out-of-band Infinium I probe intensities or (iii) combined methylated and unmethylated intensities. We evaluate ENmix against other available methods for both reproducibility among duplicate samples and accuracy of methylation measurement among laboratory control samples. ENmix out-performed other background correction methods for both these measures and substantially reduced the probe-design type bias between Infinium I and II probes. In reanalysis of existing EWAS data we show that ENmix can identify additional CpGs, and results in smaller P-value estimates for previously-validated CpGs. We incorporated the method into R package ENmix, which is freely available from Bioconductor website.
Highlights
DNA methylation is essential for human normal development and regulation of gene expression, while aberrant methylation has been linked with a number of human diseases [1,2]
Complex diseases can be associated with very small differences in DNA methylation profiles [14,18]. Measurement of those profiles using Infinium HumanMethylation450 BeadChips can be affected by many experimental factors [8], which can be mitigated in part by careful data preprocessing
We proposed a novel background correction method ENmix to model the methylation signal intensity with a flexible exponentialnormal mixture distribution, together with a truncated normal distribution to model background noise
Summary
DNA methylation is essential for human normal development and regulation of gene expression, while aberrant methylation has been linked with a number of human diseases [1,2]. The advance of DNA methylation arrays in recent years has enabled large-scale epigenome-wide studies at single CpG site resolution. The Illumina Infinium HumanMethylation450 BeadChip [3] is currently the most commonly utilized array providing estimation of methylation level at about half a million individual CpG sites. The array is based on measuring probe hybridization intensity values of bisulfite-converted DNA to estimate the relative abundance of methylated and unmethylated cytosines at selected loci. These quantitative measures are sensitive to variations in experimental conditions [4]. The array uses probes with two different chemistries (Infinium I and Infinium II) and two fluorescent dyes (Cy3-green/Cy5-red) introducing further complexity to the resulting data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.