We have studied the statistical constraints on synonymous codon choice to evaluate various proposals regarding the origin of the bias in synonymous codon usage observed by Fiers et al. (1975), Air et al. (1976), Grantham et al. (1980) and others. We have determined the statistical dependence of the degenerate third base on either of its nearest neighbors in mitochondrial, prokaryotic, and eukaryotic coding sequences. We noted an increasing dependence of the third base on its nearest neighbors in moving from mitochrondria to prokaryotes to eukaryotes. A statistical model assuming random equiprobable selection of synonymous codons was found grossly adequate for the mitochondria, but totally indequate for prokaryotes and eukaryotes. A model assuming selection of synonymous codons reflecting a genomic strategy, i.e. the genome hypothesis of Grantham et al. (1980), gave a good approximation of the mitochondrial sequences. A statistical model which exactly maintains codon frequency, but allows the position of corresponding synonymous codons to vary was only grossly adequate for prokaryotes and totally inadequate for eukaryotes. The results of these simulations are consistent with the measures on experimental sequences and suggest that a “frequency constraint” model such as that of Grantham et al. (1980) may be an adequate explanation of the codon usage in mitochondria. However, in addition to this frequency constraint, there may be constraints on synonymous codon choice in prokaryotes due to codon context. Furthermore, any proposal to explain codon usage in eukaryotes must involve a constraint on the context of a codon in the sequence.
Read full abstract