Abstract

DNA methylation is one of the most studied epigenetic modifications that has applications ranging from transcriptional regulation to aging, and can be assessed by bisulfite sequencing (BS-seq) or enzymatic methyl sequencing (EM-seq) at single base-pair resolution. The permutations of methylation statuses given by aligned reads reflect the methylation patterns of individual cells. These patterns at specific genomic locations are sought to be indicative of cellular heterogeneity within a cellular population, which are predictive of developments and diseases; therefore, methylation heterogeneity has potentials in early detection of these changes. Computational methods have been developed to assess methylation heterogeneity using methylation patterns formed by four consecutive CpGs, but the nature of shotgun sequencing often give partially observed patterns, which makes very limited data available for downstream analysis. While many programs are developed to impute genome-wide methylation levels, currently there is only one method developed for recovering partially observed methylation patterns; however, the program needs lots of data to train and cannot be used directly; therefore, we developed a probabilistic-based imputation method that uses information from neighbouring sites to recover partially observed methylation patterns speedily. It is demonstrated to allow for the evaluation of methylation heterogeneity at 15% more regions genome-wide with high accuracy for data with moderate depth. To make it more user-friendly we also provide a computational pipeline for genome-screening, which can be used in both evaluating methylation levels and profiling methylation patterns genomewide for all cytosine contexts, which is the first of its kind. Our method allows for accurate estimation of methylation levels and makes evaluating methylation heterogeneity available for much more data with reasonable coverage, which has important implications in using methylation heterogeneity for monitoring changes within the cellular populations that were impossible to detect for the assessment of development and diseases.

Highlights

  • Methylation is one of the most studied epigenetic modifications (Moore et al, 2013)

  • There is only one existing method that recovers methylation patterns, which can be beneficial for the evaluation of methylation heterogeneity; the program written is standalone; it only imputes or completes a binary matrix of indicator variables that represent the methylation statuses within a window of given numbers of CpGs; it is up to the users to extract the windows for training and predicting and to output results useful for downstream analyses

  • Our program (Figure 4) is able to screen for methylation pattern genomewide, impute missing statuses and output the profiles of methylation statuses at each cytosine and the copy number of every possible methylation patterns given the size of the window

Read more

Summary

INTRODUCTION

Methylation is one of the most studied epigenetic modifications (Moore et al, 2013). It is known to be involved in a wide range of key biological processes including regulation of gene expression, developments (Hsieh et al, 2020), aging and silencing of transposable elements (Jin et al, 2011). Melissa (Kapourani and Sanguinetti, 2019) and DeepCpG (Angermueller et al, 2017) were developed for imputing methylation levels in single cell methylomes Despite their usefulness in inferring methylation levels genomewide, they were not designed for and are unable to recover read specific methylation patterns that are needed for the estimation of methylation heterogeneity since it requires read identity for each methylation status. There is high correlation of methylation among cytosines that are nearby (Affinito et al, 2020) We use this property extensively to borrow the most information from nearby sites and developed a probabilistic-based imputation method to impute accurate methylation statuses speedily. It is easier to use, can be run with one command and outputs results readily for downstream analyses

METHODS
RESULT
Imputation Predicts Methylation Statuses Accurately
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.