Abstract

A number of empirical Bayes models (each with different statistical distribution assumptions) have now been developed to analyze differential DNA methylation using high-density oligonucleotide tiling arrays. However, it remains unclear which model performs best. For example, for analysis of differentially methylated regions for conservative and functional sequence characteristics (e.g., enrichment of transcription factor-binding sites (TFBSs)), the sensitivity of such analyses, using various empirical Bayes models, remains unclear. In this paper, five empirical Bayes models were constructed, based on either a gamma distribution or a log-normal distribution, for the identification of differential methylated loci and their cell division—(1, 3, and 5) and drug-treatment-(cisplatin) dependent methylation patterns. While differential methylation patterns generated by log-normal models were enriched with numerous TFBSs, we observed almost no TFBS-enriched sequences using gamma assumption models. Statistical and biological results suggest log-normal, rather than gamma, empirical Bayes model distribution to be a highly accurate and precise method for differential methylation microarray analysis. In addition, we presented one of the log-normal models for differential methylation analysis and tested its reproducibility by simulation study. We believe this research to be the first extensive comparison of statistical modeling for the analysis of differential DNA methylation, an important biological phenomenon that precisely regulates gene transcription.

Highlights

  • High-density oligonucleotide tiling arrays have been widely utilized to globally analyze chromatin modifications across entire genomes, including assessments of DNA methylation, in addition to the identification of transcription factor binding sites [1,2,3,4,5,6,7]

  • Binary-Log-Normal-Normal-Normal Model (BLNNN) had 0 enriched transcription factor-binding site (TFBS) in the randomly differentially methylated loci, while the gamma models had essentially no enriched TFBSs in all three methylation categories ( Binary-Gamma-Gamma Model (BGG) categorized four enriched TFBSs among randomly differentially methylated loci). These results indicate that TFBS enrichment analysis is highly sensitive to the empirical Bayes model distribution assumption and that stochastically differentially methylated loci selected by log-normal models are more sensitive for TFBS enrichment, as compared to the gamma models

  • By utilizing the same experiment sets and same methods to calculate TFBS enrichments (Table 4) as in our previous study [30], we found no TFBS enrichment in stochastic hypo- or hypermethylation by gamma models, which indicates the inaccurate identification on differential methylation

Read more

Summary

Introduction

High-density oligonucleotide tiling arrays have been widely utilized to globally analyze chromatin modifications across entire genomes, including assessments of DNA methylation, in addition to the identification of transcription factor binding sites [1,2,3,4,5,6,7]. There have been numerous statistical inference frameworks developed for microarray differential analysis, Comparative and Functional Genomics including empirical [10] and nonempirical Bayes [11] and frequentist approaches [12]. Numerous empirical Bayes methods and algorithms have been applied to analyze microarray-based studies, including gene expression [13,14,15,16], protein-to-DNA binding (chromatin-immunoprecipitation (ChIP)) [17, 18], and DNA methylation [19, 20]. In this study, we performed a comparison of the accuracy of various empirical Bayes models for analyzing these universally utilized biological assessments

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call