Abstract

Background/AimsRecently, next-generation sequencing-based technologies have enabled DNA methylation profiling at high resolution and low cost. Methyl-Seq and Reduced Representation Bisulfite Sequencing (RRBS) are two such technologies that interrogate methylation levels at CpG sites throughout the entire human genome. With rapid reduction of sequencing costs, these technologies will enable epigenotyping of large cohorts for phenotypic association studies. Existing quantification methods for sequencing-based methylation profiling are simplistic and do not deal with the noise due to the random sampling nature of sequencing and various experimental artifacts. Therefore, there is a need to investigate the statistical issues related to the quantification of methylation levels for these emerging technologies, with the goal of developing an accurate quantification method.MethodsIn this paper, we propose two methods for Methyl-Seq quantification. The first method, the Maximum Likelihood estimate, is both conceptually intuitive and computationally simple. However, this estimate is biased at extreme methylation levels and does not provide variance estimation. The second method, based on Bayesian hierarchical model, allows variance estimation of methylation levels, and provides a flexible framework to adjust technical bias in the sequencing process.ResultsWe compare the previously proposed binary method, the Maximum Likelihood (ML) method, and the Bayesian method. In both simulation and real data analysis of Methyl-Seq data, the Bayesian method offers the most accurate quantification. The ML method is slightly less accurate than the Bayesian method. But both our proposed methods outperform the original binary method in Methyl-Seq. In addition, we applied these quantification methods to simulation data and show that, with sequencing depth above 40–300 (which varies with different tissue samples) per cleavage site, Methyl-Seq offers a comparable quantification consistency as microarrays.

Highlights

  • MethodsWe propose two methods for Methyl-Seq quantification

  • DNA methylation is an epigenetic regulatory mechanism implicated with various human diseases [1,2]. cytosine nucleotides in DNA molecules, primarily in the CpG context, may be methylated, and the changes in DNA methylation status can modulate expression levels of genes [3,4,5,6,7] and phenotype [8,9,10,11].In the past, measurement of DNA methylation was only feasible and affordable for a small number of individuals at a limited number of sites

  • The Maximum Likelihood (ML) method is slightly less accurate than the Bayesian method. Both our proposed methods outperform the original binary method in Methyl-Seq. We applied these quantification methods to simulation data and show that, with sequencing depth above 40–300 per cleavage site, Methyl-Seq offers a comparable quantification consistency as microarrays

Read more

Summary

Methods

Using next-generation sequencing, Methyl-Seq assays over 250,000 methylation-sensitive restriction enzyme cleavage sites grouped into over 90,000 regions. 2.3 Truncated Proportional Estimate We assume the following model for generating the tag counts in Methyl-Seq experiment. For a region with K assayable CCGG sites, the HpaII tag count at the i-th site in j-th technical replicates, yij, is generated by first generating xi’ total sequencing tags, and subsampled by a fraction (1-m), where m is the methylation level of the region. In another words, yijjx0i*Binomial(x0i,1{m), where x0i is the corresponding unobserved MspI tag count sample, generated from the same distribution as xi: Poisson(li) or Negbin(ri,pi). To alleviate the lack of estimates’ variance and the extreme bias at high HpaII count cases (low methylation), we consider a Bayesian Hierarchical model approach

Results
Introduction
Bayesian Hierarchical Model

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.