Abstract

BackgroundRegulatory regions (e.g. promoters and enhancers) play an essential role in human development and disease. Many computational approaches have been developed to predict the regulatory regions using various genomic features such as sequence motifs and evolutionary conservation. However, these DNA sequence-based approaches do not reflect the tissue-specific nature of the regulatory regions. In this work, we propose to predict regulatory regions using multiple features derived from DNA methylation profile.ResultsWe discovered several interesting features of the methylated CpG (mCpG) sites within regulatory regions. First, a hypomethylation status of CpGs within regulatory regions, compared to the genomic background methylation level, extended out >1000 bp from the center of the regulatory regions, demonstrating a high degree of correlation between the methylation statuses of neighboring mCpG sites. Second, when a regulatory region was inactive, as determined by histone mark differences between cell lines, methylation level of the mCpG site increased from a hypomethylated state to a hypermethylated state, the level of which was even higher than the genomic background. Third, a distinct set of sequence motifs was overrepresented surrounding mCpG sites within regulatory regions. Using 5 types of features derived from DNA methylation profiles, we were able to predict promoters and enhancers using machine-learning approach (support vector machine). The performances for prediction of promoters and enhancers are quite well, showing an area under the ROC curve (AUC) of 0.992 and 0.817, respectively, which is better than that simply based on methylation level, especially for prediction of enhancers.ConclusionsOur study suggests that DNA methylation features of mCpG sites can be used to predict regulatory regions.

Highlights

  • Transcriptional regulation plays an important role in most of biological processes

  • We focus on two major regulatory regions: promoters and enhancers

  • Using ChIP-seq and predicted data for several transcription factor (TF) [23], we found that the methylation levels were lowest in the center of TF binding sites and reached background level by 1500 bp (Figure 3C), despite the fact that the majority of binding sites are less than 20 bp in length

Read more

Summary

Introduction

The interactions between transcription factors and regulatory regions, such as promoters and enhancers, are essential in transcriptional regulation. Experimental and computational approaches have been developed to identify the regulatory regions on a genome-wide scale. Histone marks were measured on a genomewide scale using the ChIP-seq technique [5,6,7] These histone marks are good predictors for regulatory regions. Regulatory regions (e.g. promoters and enhancers) play an essential role in human development and disease. Many computational approaches have been developed to predict the regulatory regions using various genomic features such as sequence motifs and evolutionary conservation. These DNA sequence-based approaches do not reflect the tissue-specific nature of the regulatory regions. We propose to predict regulatory regions using multiple features derived from DNA methylation profile

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call