Abstract
While CpG dinucleotides are significantly reduced compared to other dinucleotides in mammalian genomes, they can congregate and form CpG islands, which localize around the 5ʹ regions of genes, where they function as promoters. CpG-island promoters are generally unmethylated and are often found in housekeeping genes. However, their nucleotide sequences and existence per se are not conserved between humans and mice, which may be due to evolutionary gain and loss of the regulatory regions. In this study, human and rhesus monkey genomes, with moderately conserved sequences, were compared at base resolution. Using transcription start site data, we first validated our methods’ ability to identify orthologous promoters and indicated a limitation using the 5ʹ end of curated gene models, such as NCBI RefSeq, as their transcription start sites. We found that, in addition to deamination mutations, insertions and deletions of bases, repeats, and long fragments contributed to the mutations of CpG dinucleotides. We also observed that the G + C contents tended to change in CpG-poor environments, while CpG content was altered in G + C-rich environments. While loss of CpG islands can be caused by gradual decreases in CpG sites, gain of these islands appear to require two distinct nucleotide altering steps. Taken together, our findings provide novel insights into the process of acquisition and diversification of CpG-island promoters in vertebrates.
Highlights
In vertebrate genomic sequences, the content of CpG dinucleotides is significantly lower than expected based on the nucleotide composition (Bird 1980), which is likely due to DNA methylation
CpG islands are often associated with gene promoters (Bird 1987), more than half are likely located in nonpromoter regions in the human genome (Takai and Jones 2003)
Curated cDNA sequences, such as datasets provided by NCBI RefSeq, have been used to infer putative promoter regions (Pruitt et al 2014)
Summary
The content of CpG dinucleotides is significantly lower than expected based on the nucleotide composition (Bird 1980), which is likely due to DNA methylation. A limited number of unmethylated CpG sites congregate in a specific location, forming a CpG island. Invertebrate organisms retain the DNA methylation system that depends on the dinucleotide. In contrast to the sparse and localized presence of vertebrate CpG islands, invertebrate CpG sites form a much longer tract that distributes in a mosaic manner (Tweedie et al 1997). As vertebrates and invertebrates have distinct gene regulatory mechanisms (Zemach and Zilberman 2010), understanding the evolutionary gain of CpG islands will provide greater insight into the human gene regulatory system
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.