Abstract

As the fifth base in mammalian genome, 5-methylcytosine (5 mC) is essential for many biological processes including normal development and disease. Methylated DNA immunoprecipitation sequencing (MeDIP-seq), which uses anti-5 mC antibodies to enrich for methylated fraction of the genome, is widely used to investigate methylome at a resolution of 100–500 bp. Considering the CpG density-dependent bias and limited resolution of MeDIP-seq, we developed a Random Forest Regression (RFR) model method, MeSiC, to estimate DNA methylation levels at single-base resolution. MeSiC integrated MeDIP-seq signals of CpG sites and their surrounding neighbors as well as genomic features to construct genomic element-dependent RFR models. In the H1 cell line, a high correlation was observed between MeSiC predictions and actual 5 mC levels. Meanwhile, MeSiC enabled to calibrate CpG density-dependent bias of MeDIP-seq signals. Importantly, we found that MeSiC models constructed in the H1 cell line could be used to accurately predict DNA methylation levels for other cell types. Comparisons with methylCRF and MEDIPS showed that MeSiC achieved comparable and even better performance. These demonstrate that MeSiC can provide accurate estimations of 5 mC levels at single-CpG resolution using MeDIP-seq data alone.

Highlights

  • The field of DNA methylation is capitalizing on the sequencing technology

  • Through comparing the distribution of 5 mC levels with that of MeDIP-seq derived RPM values, we found that highly methylated CpG sites were mainly enriched for windows with low CpG density but the enrichment was not observed in MeDIP-seq profile (Fig. 1A,B), suggesting the limited accuracy of MeDIP-seq data in low CpG density regions

  • The overall Pearson correlation coefficient (PCC) was 0.84 (Fig. 5E), and PCCs and concordances were from 0.79 to 0.86 and from 0.80 to 0.91, respectively, at distinct genomic elements (Fig. 5F). These findings indicated that the predictions of MeSiC were highly consistent with Infinium HumanMethylation[27] microarray data, especially at CpG islands (CGI), shore, 5′ UTR and exon, which was analogous to the observations in the H1 cell line

Read more

Summary

Introduction

The field of DNA methylation is capitalizing on the sequencing technology. Whole genome bisulfite sequencing (BS-seq)[4,5] is considered as a golden standard to quantify and analyze genome-wide DNA methylation levels at single-base resolution[6]. Based on the above-mentioned assumption, we proposed a method, MeSiC, that used the Random Forest Regression (RFR) algorithm to model 5 mC levels of single CpG sites using MeDIP-seq signals and genomic features. A genome browser view of actual, predicted 5 mC levels and normalized MeDIP-seq read counts of CpG sites around this gene were shown at single-base resolution (Fig. 3B), and we found a high concordance (0.90) between predictions and 5 mC levels.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.