Abstract

Motivation: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance. Although sequencing-based technologies are increasingly allowing high-resolution measurements of DNA methylation, statistical modelling of such data is still challenging. In particular, statistical identification of differentially methylated regions across different conditions poses unresolved challenges in accounting for spatial correlations within the statistical testing procedure.Results: We propose a non-parametric, kernel-based method, M3D, to detect higher order changes in methylation profiles, such as shape, across pre-defined regions. The test statistic explicitly accounts for differences in coverage levels between samples, thus handling in a principled way a major confounder in the analysis of methylation data. Empirical tests on real and simulated datasets show an increased power compared to established methods, as well as considerable robustness with respect to coverage and replication levels.Availability and implementation: R/Bioconductor package M3D.Contact: G.Sanguinetti@ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • IntroductionDNA methylation is an epigenetic mark associated with many fundamental biological processes of direct clinical relevance, such as imprinting, retrotransposon silencing and cell differentiation (Gopalakrishnan et al, 2008; Laurent et al, 2010)

  • Motivation: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance

  • DNA methylation is an epigenetic mark associated with many fundamental biological processes of direct clinical relevance, such as imprinting, retrotransposon silencing and cell differentiation (Gopalakrishnan et al, 2008; Laurent et al, 2010)

Read more

Summary

Introduction

DNA methylation is an epigenetic mark associated with many fundamental biological processes of direct clinical relevance, such as imprinting, retrotransposon silencing and cell differentiation (Gopalakrishnan et al, 2008; Laurent et al, 2010). The resulting counts of cytosine and thymine at registered cytosine loci form the basis of further analysis. This general procedure has been adapted in various ways, with reduced representation bisulfite sequencing (RRBS) being one of the most widely used. RRBS involves using a restriction enzyme such as MspI (or TaqI) to cleave the DNA at CCGG (or TCGA) loci and selecting short reads for sequencing (Gu et al, 2011). This results in greater coverage of CpG dense regions at lower cost

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call