All complete retrovirus sequences in the GenEMBL database were examined with the goal of assessing possible relationships between the nucleotide composition of retroviral genomes, the amino acid composition of retroviral proteins, and evolutionary strategies used by retroviruses. The results demonstrated that the genome of each viral lineage has a characteristic base composition and that the variations between groups are related to retroviral phylogeny. By analogy to microbial species, we suggest that the variations arise from group-specific patterns of directional mutations where the bias can be exerted on any of the four nucleotides. It is most likely that the mutational patterns are introduced during reverse transcription, and a direct participation of reverse transcriptase in the process is suspected. A straightforward strategy was used to analyze the compositional relationship between nucleotides and encoded amino acids. The procedure entailed calculations of amino acid frequencies from nucleotide content and the comparison of the calculated values to the observed amino acid frequencies in retroviruses. The results revealed an excellent correspondence between variation in genomic base composition and variation in amino acid composition of proteins with the compositional differences extending into all major coding regions of the viruses. Because of the magnitude and dispersion of these effects, and because of the nonconservative nature of many of the substitutions between groups with different genomic biases, we suggest that the variations in protein composition driven by biased nucleotide frequencies are an important factor in shaping the characteristic phenotypes of the different viral lineages. A clue to the nature of the evolutionary forces that are responsible for the generation of nucleotide biases was provided by the observation that viruses with radically different base frequencies most often inhabit the same cell type. This observation, along with analysis of amino acid and nucleotide replacement patterns between and within reverse transcriptase sequences from the various groups, permitted us to advance a model for the evolution of retroviruses. According to the model, speciation could initiate when daughter virions from a single progenitor vary in the direction of their mutational bias. These variations would exert a pleiotropic effect on the frequencies of nucleotides in all viral genes and consequently on the frequencies of amino acids in the encoded proteins. The variants with the most extreme compositional differences would have a selective advantage because their different precursor requirements would enable them to occupy different ecological niches within a single cell.(ABSTRACT TRUNCATED AT 400 WORDS)
Read full abstract