The translation of mRNA in all forms of life uses a three-nucleotide codon and aminoacyl-tRNAs to synthesize a protein. There are 64 possible codons in the genetic code, with codons for the ∼20 amino acids and 3 stop codons having 1- to 6-fold degeneracy. Recent studies have shown that families of stress response transcripts, termed modification tunable transcripts (MoTTs), use distinct codon biases that match specifically modified tRNAs to regulate their translation during a stress. Similarly, translational reprogramming of the UGA stop codon to generate selenoproteins or to perform programmed translational read-through (PTR) that results in a longer protein, requires distinct codon bias (i.e., more than one stop codon) and, in the case of selenoproteins, a specifically modified tRNA. In an effort to identify transcripts that have codon usage patterns that could be subject to translational control mechanisms, we have used existing genome and transcript data to develop the gene-specific Codon UTilization (CUT) tool and database, which details all 1-, 2-, 3-, 4- and 5-codon combinations for all genes or transcripts in yeast (Saccharomyces cerevisiae), mice (Mus musculus) and rats (Rattus norvegicus). Here, we describe the use of the CUT tool and database to characterize significant codon usage patterns in specific genes and groups of genes. In yeast, we demonstrate how the CUT database can be used to identify genes that have runs of specific codons (e.g., AGA, GAA, AAG) linked to translational regulation by tRNA methyltransferase 9 (Trm9). We further demonstrate how groups of genes can be analyzed to find significant dicodon patterns, with the 80 Gcn4-regulated transcripts significantly (P<0.00001) over-represented with the AGA-GAA dicodon. We have also used the CUT database to identify mouse and rat transcripts with internal UGA codons, with the surprising finding of 45 and 120 such transcripts, respectively, which is much larger than expected. The UGA data suggest that there could be many more translationally reprogrammed transcripts than currently reported. CUT thus represents a multi-species codon-counting database that can be used with mRNA-, translation- and proteomics-based results to better understand and model translational control mechanisms.
Read full abstract