The relationship between the level of repetitiveness in genomic sequences and genome size has been re-investigated making use of the rapidly growing database of complete eubacterial and archaeal genome sequences combined with the fragmentary but now large amount of data from eukaryotic genomes. Relative simplicity factors (RSFs), which measure the repetitiveness of sequences, were calculated and significantly simple motifs (SSMs), which identify the kinds of sequences that are repeated, were identified. A previously reported correlation between genome size and repetitiveness was confirmed, but it was shown that the higher RSFs seen in eukaryotic genomes also reflect a generally higher level of repetitiveness independent of genome size differences. Differences in genome size are responsible for about 10% of the variance in RSF seen between species. The spectrum of SSMs seen within a genome differed markedly within the eubacteria but less so in eukaryotes and, particularly, in archaea. Species with SSM spectra that differ from the norm tend also to have high RSFs for their genome size and to be pathogens that make use of repetitive sequences to avoid host defence responses. Some of the variance in repetitiveness seen in other species may therefore also reflect the action of selection, although other forces such as variation in the effectiveness of mechanisms for regulating slippage errors of replication, may also be important.
Read full abstract