Statistical Quantification of Methylation Levels by Next-Generation Sequencing

Guodong Wu,Degui Zhi,Devin Absher,Nengjun Yi,Janet Kelso

doi:10.1371/journal.pone.0021034

Abstract

Background/AimsRecently, next-generation sequencing-based technologies have enabled DNA methylation profiling at high resolution and low cost. Methyl-Seq and Reduced Representation Bisulfite Sequencing (RRBS) are two such technologies that interrogate methylation levels at CpG sites throughout the entire human genome. With rapid reduction of sequencing costs, these technologies will enable epigenotyping of large cohorts for phenotypic association studies. Existing quantification methods for sequencing-based methylation profiling are simplistic and do not deal with the noise due to the random sampling nature of sequencing and various experimental artifacts. Therefore, there is a need to investigate the statistical issues related to the quantification of methylation levels for these emerging technologies, with the goal of developing an accurate quantification method.MethodsIn this paper, we propose two methods for Methyl-Seq quantification. The first method, the Maximum Likelihood estimate, is both conceptually intuitive and computationally simple. However, this estimate is biased at extreme methylation levels and does not provide variance estimation. The second method, based on Bayesian hierarchical model, allows variance estimation of methylation levels, and provides a flexible framework to adjust technical bias in the sequencing process.ResultsWe compare the previously proposed binary method, the Maximum Likelihood (ML) method, and the Bayesian method. In both simulation and real data analysis of Methyl-Seq data, the Bayesian method offers the most accurate quantification. The ML method is slightly less accurate than the Bayesian method. But both our proposed methods outperform the original binary method in Methyl-Seq. In addition, we applied these quantification methods to simulation data and show that, with sequencing depth above 40–300 (which varies with different tissue samples) per cleavage site, Methyl-Seq offers a comparable quantification consistency as microarrays.

Highlights

MethodsWe propose two methods for Methyl-Seq quantification
DNA methylation is an epigenetic regulatory mechanism implicated with various human diseases [1,2]. cytosine nucleotides in DNA molecules, primarily in the CpG context, may be methylated, and the changes in DNA methylation status can modulate expression levels of genes [3,4,5,6,7] and phenotype [8,9,10,11].In the past, measurement of DNA methylation was only feasible and affordable for a small number of individuals at a limited number of sites
The Maximum Likelihood (ML) method is slightly less accurate than the Bayesian method. Both our proposed methods outperform the original binary method in Methyl-Seq. We applied these quantification methods to simulation data and show that, with sequencing depth above 40–300 per cleavage site, Methyl-Seq offers a comparable quantification consistency as microarrays

Summary

Methods

Using next-generation sequencing, Methyl-Seq assays over 250,000 methylation-sensitive restriction enzyme cleavage sites grouped into over 90,000 regions. 2.3 Truncated Proportional Estimate We assume the following model for generating the tag counts in Methyl-Seq experiment. For a region with K assayable CCGG sites, the HpaII tag count at the i-th site in j-th technical replicates, yij, is generated by first generating xi’ total sequencing tags, and subsampled by a fraction (1-m), where m is the methylation level of the region. In another words, yijjx0i*Binomial(x0i,1{m), where x0i is the corresponding unobserved MspI tag count sample, generated from the same distribution as xi: Poisson(li) or Negbin(ri,pi). To alleviate the lack of estimates’ variance and the extreme bias at high HpaII count cases (low methylation), we consider a Bayesian Hierarchical model approach

Results

Introduction

Bayesian Hierarchical Model

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jun 15, 2011
Citations: 25	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Statistical Quantification of Methylation Levels by Next-Generation Sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Maximum likelihood method
Andreas Ziegler
-
Andreas ZieglerAndreas Ziegler
01 Jan 2010
01 Jan 2010

Maximum likelihood estimation with missing outcomes: From simplicity to complexity.
Stuart G Baker
Statistics in Medicine | VOL. 38
Stuart G BakerStuart G Baker
08 Aug 2019
Statistics in Medicine | VOL. 38

A Bayesian Approach to Binary Logistic Regression Model with Application to OECD Data
Asuman Yilmaz ... H.Eray Çeli̇k
Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi | VOL. 26
Asuman Yilmaz, et. al.Asuman Yilmaz ... H.Eray Çeli̇k
31 Aug 2021
Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi | VOL. 26

Abstract P139: Stability of DNA Methylation Profiles in Human DNA Samples in Long Term Storage
Yingchuan Li ... Allen W Cowley
Hypertension | VOL. 68
Yingchuan Li, et. al.Yingchuan Li ... Allen W Cowley
01 Sep 2016
Hypertension | VOL. 68

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Quantification of Methylation Levels by Next-Generation Sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE