Abstract

Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately.

Highlights

  • DNA methylation is known to play an important role in vertebrate gene regulation [1, 2]

  • We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of Non-methylated islands (NMIs) genome-wide in all vertebrate organisms that were studied

  • Our results show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately

Read more

Summary

Introduction

DNA methylation is known to play an important role in vertebrate gene regulation [1, 2]. Most of the human genome is usually methylated, over 30 years ago a relatively small number of non-methylated regions were identified using methylation-sensitive restriction enzymes [3]. These non-methylated regions were found to have a higher than expected number of CpG dinucleotides when compared to the rest of the genome, and it was suggested that this is because methylated CpGs are more likely to be mutated to CpAs or TpGs than nonmethylated CpGs, leading to the reduction of CpG dinucleotides in most of the genome [4, 5]. A variant of this method is still used to provide an annotation of CpG islands in the popular UCSC Genome Browser [7], and for many years it has been used as a proxy for non-methylated regions of the genome

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call