Abstract
BackgroundThe comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages.ResultsThe persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages.ConclusionHigher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.
Highlights
The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions
We chose to systematically compare humanchicken alignments for vertebrates to Drosophila melanogaster-D. virilis alignments for drosophilids, since the unconstrained mutational distances are very close for these species pairs (Additional file 1), and all the pairs are distant enough to allow a clear separation between con
Most coding sequences are included in the alignments for all genome pairs. 69.45% of human CDS are included in human-chicken (Hs-Gg) alignments, 96.55 % of D. melanogaster CDS in D. melanogaster-D. virilis (Dm-Dv) alignments
Summary
The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. We characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. Large-scale conservation of non-coding genomic regions has been discovered by Dermitzakis et al, after alignment of the human chromosome 21 to homologous regions of the mouse genome. This work reported that protein-coding genes were more conserved overall than non-genic regions, giving a large-scale confirmation that evolutionary conservation is a hallmark of biological function. Conserved non coding regions (CNC) are referred to by others as conserved non-genic (CNG) regions[1], conserved noncoding elements (CNE) [4] or highly conserved elements (HCE) [5]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.