Abstract

CpG islands (CGIs) are one of the most widely studied regulatory features of the human genome, with critical roles in development and disease. Despite such significance and the original epigenetic definition, currently used CGI sets are typically predicted from DNA sequence characteristics. Although CGIs are deeply implicated in practical analyses of DNA methylation, recent studies have shown that such computational annotations suffer from inaccuracies. Here we used whole-genome bisulfite sequencing from 10 diverse human tissues to identify a comprehensive, experimentally obtained, single-base resolution CGI catalog. In addition to the unparalleled annotation precision, our method is free from potential bias due to arbitrary sequence features or probe affinity differences. In addition to clarifying substantial false positives in the widely used University of California Santa Cruz (UCSC) annotations, our study identifies numerous novel epigenetic loci. In particular, we reveal significant impact of transposable elements on the epigenetic regulatory landscape of the human genome and demonstrate ubiquitous presence of transcription initiation at CGIs, including alternative promoters in gene bodies and non-coding RNAs in intergenic regions. Moreover, coordinated DNA methylation and chromatin modifications mark tissue-specific enhancers at novel CGIs. Enrichment of specific transcription factor binding from ChIP-seq supports mechanistic roles of CGIs on the regulation of tissue-specific transcription. The new CGI catalog provides a comprehensive and integrated list of genomic hotspots of epigenetic regulation.

Highlights

  • Since their initial discovery almost three decades ago [1,2,3], numerous studies have established the critical importance of CpG islands (CGIs) in fundamental regulatory and developmental processes [4,5,6,7,8]

  • We identified a set of 51 572 non-overlapping experimentally supported CGIs (eCGIs) across tissues (Supplementary Material, Table S1)

  • We find that 27.5% of the intergenic eCGIs overlap with non-coding RNAs in the NONCODE V4 database [52] and 43.1% of the intergenic eCGIs have an ncRNA within 3 kb

Read more

Summary

Introduction

Since their initial discovery almost three decades ago [1,2,3], numerous studies have established the critical importance of CpG islands (CGIs) in fundamental regulatory and developmental processes [4,5,6,7,8]. Even though CGIs were originally experimentally defined [1], subsequent annotations of CGIs relied on sequence-based computational algorithms, due to the lack of actual DNA methylation data [2,19,20,21]. These computational algorithms have been 70 | Human Molecular Genetics, 2016, Vol 25, No 1 extremely valuable for almost two decades. Many hypomethylated CpG-rich sequences (representing the very definition of CGIs) are missing from the computationally annotated CGI sets [5,24] (i.e. false negatives). Re-visiting the epigenetic definition of CGIs and providing an experimentally defined CGI catalog that overcomes the limitations of computational predictions will offer a tremendous resource for advancing our knowledge

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.