Abstract
BackgroundRecently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied.ResultsWe studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene.ConclusionsCpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.
Highlights
It has been discovered that the human genome contains many transcription start sites for non-coding RNA
This agrees with a large number of CpG island (CGI) not associated with transcription start site of any known protein coding gene
We demonstrate that many of such CGIs appear to be related to transcription initiation and at least some of them contain CpG pairs stabilized by natural selection
Summary
It has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The increased number of CpG sites in CGIs is often correlated with low methylation of cytosine in CpG dinucleotides [6,7,8,9]. This effect is usually explained by postulating protection of these sites from DNA methyltransferase by abundant and commonly utilized DNA binding proteins including Sp1 [10], E2F [11], CTCF [12] and others. It was found that Sp1 can bind to both methylated and non-methylated variants of this binding site [14], and can protect non-methylated sites from methylation [10]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.