Abstract
BackgroundThe traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methylated regions). Variation limited to single or small numbers of CpGs has been assumed to reflect stochastic processes. To test this, we developed software, Cluster-Based analysis of CpG methylation (CluBCpG), and explored variation in read-level CpG methylation patterns in whole genome bisulfite sequencing data.ResultsAnalysis of both human and mouse whole genome bisulfite sequencing datasets reveals read-level signatures associated with cell type and cell type-specific biological processes. These signatures, which are mostly orthogonal to classical differentially methylated regions, are enriched at cell type-specific enhancers and allow estimation of proportional cell composition in synthetic mixtures and improved prediction of gene expression. In tandem, we developed a machine learning algorithm, Precise Read-Level Imputation of Methylation (PReLIM), to increase coverage of existing whole genome bisulfite sequencing datasets by imputing CpG methylation states on individual sequencing reads. PReLIM both improves CluBCpG coverage and performance and enables identification of novel differentially methylated regions, which we independently validate.ConclusionsOur data indicate that, rather than stochastic variation, read-level CpG methylation patterns in tissue whole genome bisulfite sequencing libraries reflect cell type. Accordingly, these new computational tools should lead to an improved understanding of epigenetic regulation by DNA methylation.
Highlights
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites
A new approach for identifying read-level DNA methylation patterns within Whole genome bisulfite sequencing (WGBS) data We developed a software package called Cluster-Based analysis of CpG methylation (CluBCpG)
CluBCpG operates on the BAM files generated by mapping WGBS reads with Bismark [15] and standard preprocessing tools
Summary
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methylated regions). Epipolymorphism [5] quantifies the proportion of distinct read-level patterns of methylation (referred to as epi-haplotypes) by calculating the probability that two randomly sampled reads contain different epi-haplotypes. Another metric, methylation entropy [10], assesses read-level heterogeneity by an “information-theoretic” approach based on Shannon’s entropy. A potential improvement over epipolymorphism and methylation entropy due to its ability to distinguish different combinations of methylation haplotypes, MHL fails to distinguish cell type-specific methylation patterns in certain contexts (Supplementary Figure 1) None of these metrics captures the numbers, proportions, or specific patterns of unique epi-haplotypes. Other attempts at exploring readlevel information in WGBS data sets have focused on minority regions such as those exhibiting “bipolar methylation” [13] or containing “hypo-methylated alleles” [14], rather than assessing the full breadth and depth of potential cell type-specific signals
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.