Abstract
We propose an extension to quantile normalization that removes unwanted technical variation using control probes. We adapt our algorithm, functional normalization, to the Illumina 450k methylation array and address the open problem of normalizing methylation data with global epigenetic changes, such as human cancers. Using data sets from The Cancer Genome Atlas and a large case–control study, we show that our algorithm outperforms all existing normalization methods with respect to replication of results between experiments, and yields robust results even in the presence of batch effects. Functional normalization can be applied to any microarray platform, provided suitable control probes are available.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-014-0503-2) contains supplementary material, which is available to authorized users.
Highlights
In humans, DNA methylation is an important epigenetic mark occurring at CpG dinucleotides, which is implicated in gene silencing
The analysis of the 450k data does not include a sample batch in the model, which allows us to see how well the different normalization methods remove technical artifacts introduced by batch differences. While both of the analyses are conducted on the full set of CpGs, we focus on the CpGs common between the two platforms and ask: ‘What is the degree of agreement between the top k differentially methylation position (DMP) identified using the two different platforms?’ Figure 4a shows that functional normalization and noob outperform both quantile normalization and raw data for all values of k, and functional normalization is marginally better than noob for some values of k
We have shown that this method is especially valuable for normalizing large-scale studies where we expect substantial global differences in methylation, such as in cancer studies or when comparing between tissues, and when the goal is to perform inference at the probe level
Summary
DNA methylation is an important epigenetic mark occurring at CpG dinucleotides, which is implicated in gene silencing. In 2011, Illumina released the HumanMethylation450 bead array [1], known as the 450k array. This array has enabled population-level studies of DNA methylation by providing a cheap, highthroughput and comprehensive assay for DNA methylation. Applications of this array to population-level data include epigenome-wide association studies (EWAS) [2,3] and large-scale cancer studies, such as the ones available through The Cancer Genome Atlas (TCGA). Studies of DNA methylation in cancer pose a challenging problem for array normalization. The authors note that not using normalization is better than using the methods they evaluated, highlighting the importance of benchmarking any method against raw data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.