Abstract

BackgroundDNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set.ResultsWe compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios.ConclusionsFor DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes.

Highlights

  • DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation

  • Large-scale examination of DNA methylation through microarray or sequencing technologies makes epigenomewide association studies (EWAS) feasible to explore associations between DNA methylation and cancers in the sustained effort to develop novel anti-cancer drugs, and to identify DNA methylation markers associated with certain cancers for prognosis and diagnosis purpose [6]

  • The standard output from the BeadChip assay for quantifying methylation is the β value, which is calculated from the intensity of methylated allele (Max(M, 0)) and the intensity of unmethylated allele (Max(U, 0)) according to the following formula [7]

Read more

Summary

Introduction

DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. DNA methylation is a biochemical process of adding a methyl group at the 5 carbon of the cytosine ring to form 5-methylcytosine (found at cytosine-guanosine dinucleotides (CpGs)) and plays a significant role in the development and progression of human disease [1]. The Illumina HumanMethylation BeadChip technology is a popular platform for conducting epigenome-wide association studies. Three platforms have been developed by Illumina for DNA methylation assay: GoldenGate, Infinium Human Methylation and Infinium HumanMethylation450 BeadChip. The standard output from the BeadChip assay for quantifying methylation is the β value, which is calculated from the intensity of methylated allele (Max(M, 0)) and the intensity of unmethylated allele (Max(U, 0)) according to the following formula [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call