Abstract

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.

Highlights

  • DNA methylation—the covalent addition of methyl groups to cytosine bases—is a major epigenetic gene regulatory mechanism observed in a wide variety of species

  • DNA methylation levels are known to be heritable—and are affected by kinship and population structure—existing approaches for modeling bisulfite sequencing data fail to account for this covariance

  • Using simulations and two real data sets, we demonstrate that our model provides well-calibrated p-values and improves power compared with previous methods

Read more

Summary

Introduction

DNA methylation—the covalent addition of methyl groups to cytosine bases—is a major epigenetic gene regulatory mechanism observed in a wide variety of species. DNA methylation levels are strongly linked to disease, including major public health burdens such as diabetes [7,8], Alzheimer’s disease [9,10], and many forms of cancer [7,11,12,13,14,15]. Together, these observations point to a central role for DNA methylation in shaping genome architecture, influencing development, and driving trait variation. We present one such approach, designed for analyses of differential methylation levels in bisulfite sequencing datasets

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.