In many organisms, variation among genes in the relative frequencies of synonymous codons reflects variation in local compositional biases and the intensity of natural selection on codon usage. Identification of preferred codons has been straightforward in most important model organisms, but this has not been the case with Arabidopsis thaliana. A data set of 13,812 genes suitable for our analyses was constructed from the most recent annotation of the complete A. thaliana genome. Factor analysis was performed to identify the primary trend through overall codon usage. Although the primary trend in codon usage in A. thaliana reflects the relative usage of G/C‐ending compared with A/U‐ending codons, there is no correlation with noncoding $$\mathrm{G}\,+\mathrm{C}\,$$ content. The usage of G‐ and C‐ending codons is actually negatively correlated with noncoding $$\mathrm{G}\,+\mathrm{C}\,$$ content. However, the usages of several codons, mainly C‐ and U‐ending, are correlated with recently published data on gene expression in roots. These codons tend to correspond to the more commonly represented tRNA anticodons. Thus, a hypothetical set of translationally optimal codons has been identified in this important model system.
Read full abstract