Abstract
Conserved, ultraconserved and other classes of constrained elements (collectively referred as CNEs here), identified by comparative genomics in a wide variety of genomes, are non-randomly distributed across chromosomes. These elements are defined using various degrees of conservation between organisms and several thresholds of minimal length. We here investigate the chromosomal distribution of CNEs by studying the statistical properties of distances between consecutive CNEs. We find widespread power-law-like distributions, i.e. linearity in double logarithmic scale, in the inter-CNE distances, a feature which is connected with fractality and self-similarity. Given that CNEs are often found to be spatially associated with genes, especially with those that regulate developmental processes, we verify by appropriate gene masking that a power-law-like pattern emerges irrespectively of whether elements found close or inside genes are excluded or not. An evolutionary model is put forward for the understanding of these findings that includes segmental or whole genome duplication events and eliminations (loss) of most of the duplicated CNEs. Simulations reproduce the main features of the observed size distributions. Power-law-like patterns in the genomic distributions of CNEs are in accordance with current knowledge about their evolutionary history in several genomes.
Highlights
The sequencing and comparative analysis of many mammalian genomes has indicated that at least 5.5% of the human genome is under selective constraint; of that, 1.5% is estimated to code for proteins, 3.5% displays known regulatory functions, while for the function of the rest there is little or no information available [1]
In our analysis we include CNE datasets from various taxonomic groups and compare CNE populations exapted at different evolutionary stages
The studied CNEs are mapped on different genomes
Summary
The sequencing and comparative analysis of many mammalian genomes has indicated that at least 5.5% of the human genome is under selective constraint; of that, 1.5% is estimated to code for proteins, 3.5% displays known regulatory functions, while for the function of the rest there is little or no information available [1]. One of the most interesting findings that have arisen from comparative analysis among mammalian genomes is the discovery of hundreds of ultraconserved elements (UCEs) of more than 200 bp in length that show absolute conservation among human, mouse and rat genomes [2]. One out of four of UCEs overlaps known protein-coding genes. Such a high degree of conservation (100%) is not expected even in exons, due to the degeneration of the genetic code. Several thresholds of minimal length of conserved sequence have been used as well as the exclusion of elements inside protein-coding genes [3,4]. We here use the specific name only when we refer to the corresponding class of elements
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have