Abstract

In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use whole-genome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normal-cancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question.

Highlights

  • DNA methylation is a key epigenetic mark that has become widely implicated in human development and disease [1, 2]

  • To further characterize regions identified by each analysis, we examined enrichment of overlaps between differentially methylated region (DMR) and various functional marks obtained from ENCODE, as well as genomic features of interest such as Refseq gene promoters and CpG islands

  • We have studied the problem of comparing DNA methylation data between divergent genotypes, at the level of strain differences between inbred mice

Read more

Summary

Introduction

DNA methylation is a key epigenetic mark that has become widely implicated in human development and disease [1, 2]. One gold-standard method is whole-genome bisulfite sequencing (WGBS), which pairs bisulfite conversion of cytosine residues with next-generation sequencing [3]. At each CpG site in the reference genome, an aligned read is called as unmethylated if the sequence is TG (indicating bisulfite conversion) and methylated if the sequence is CG (indicating protection by the methyl group). Statistical packages such as BSmooth [4] can integrate this data across larger regions to estimate and compare overall methylation patterns between sample groups

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call