Abstract

The field of comparative genomics relies upon inference of neutrality or selection from sequence conservation. Recent studies of exactly-conserved sequences have revealed an anomalous, algebraic distribution of conserved sequence lengths that is inconsistent with standard models of neutral evolution based solely on local mutation. It has been proposed that linkage contributes to the shape of this anomalous distribution. Here we identify, for a variety of species, all ‘maximal’ repeats, direct or reverse-complement, within a chromosomal or whole-genome sequence of a single genome. For a set of maximal repeats of a given nucleotide length L, we report that the number of elements in the set F(L) typically exhibits an algebraic tail. We propose a method based on a cost function that allows us to analyze this distribution and estimate the range over what the distribution is most likely to be well-approximated a power law. We find that the range is proportional to the genome size and that although the power-law exponent differs between species, it falls chiefly within a relatively narrow range of values. A sharp cut-off in the power-law regime is observed for some genomes that turns out to coincide with a peak in contig lengths and therefore can be attributed to artifacts of genome assembly, leading to a prediction that the extent of the power-law regime will increase as assemblies are improved. The typical algebraic behavior of length-frequency distribution is the most remarkable observation emerging from our analysis. The algebraic form of the empirical distribution of duplication lengths characterized here suggests that recombination events might as a general rule involve transfer of chunks of sequence with an algebraic length distribution. It also places strong constraints on any model of genome evolution. The observation of an algebraic distribution of exactly-duplicated sequence lengths within a genome is a direct demonstration of the net impact of linkage on genome evolution, and is consistent with the proposal that linkage contributes to the anomalous distribution of strongly-conserved sequence lengths.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call