Rare Codons Cluster

Thomas F Clarke,Patricia L Clark,Dong-Yan Jin

doi:10.1371/journal.pone.0003412

Abstract

Most amino acids are encoded by more than one codon. These synonymous codons are not used with equal frequency: in every organism, some codons are used more commonly, while others are more rare. Though the encoded protein sequence is identical, selective pressures favor more common codons for enhanced translation speed and fidelity. However, rare codons persist, presumably due to neutral drift. Here, we determine whether other, unknown factors, beyond neutral drift, affect the selection and/or distribution of rare codons. We have developed a novel algorithm that evaluates the relative rareness of a nucleotide sequence used to produce a given protein sequence. We show that rare codons, rather than being randomly scattered across genes, often occur in large clusters. These clusters occur in numerous eukaryotic and prokaryotic genomes, and are not confined to unusual or rarely expressed genes: many highly expressed genes, including genes for ribosomal proteins, contain rare codon clusters. A rare codon cluster can impede ribosome translation of the rare codon sequence. These results indicate additional selective pressures govern the use of synonymous codons, and specifically that local pauses in translation can be beneficial for protein biogenesis.

Highlights

A synonymous DNA mutation will alter the nucleotide sequence but, due to the degeneracy of the genetic code, does not alter the encoded amino acid sequence
In order to determine the relative rareness of the codons used to encode a particular amino acid sequence, we developed the %MinMax algorithm. %MinMax defines the relationship between a given mRNA sequence and hypothetical sequences encoding the same protein using the most rare or most common codons, as a function of the arithmetic mean of all possible codon usage frequencies
The complete %MinMax algorithm is shown in Methods; Figure 1 illustrates %MinMax calculations for a pentapeptide encoded with E. coli codon usage frequencies

Summary

Introduction

A synonymous DNA mutation will alter the nucleotide sequence but, due to the degeneracy of the genetic code, does not alter the encoded amino acid sequence. Previous studies of codon usage used algorithms designed to highlight common codons, not rare codons [6,7]; this reflects the general interest in increasing translation rate to improve protein expression levels, regardless of the effect on folding yield.

Results

Conclusion