Abstract

Because of the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being “synonymous,” these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is typically attributed to mutational drift, is the major determinant of variation across species. Here, we find that in addition to GC content, interspecies codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding and show that domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain of life with high accuracy. We then wondered whether the inclusion of codon usage codon autocorrelation patterns, which reflects the nonrandom distribution of codon occurrences throughout a transcript, might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to previous reports. Instead, our results suggest that codon autocorrelation patterns are a by-product of codon optimality throughout a sequence, where highly expressed genes display autocorrelated “optimal” codons, whereas lowly expressed genes display autocorrelated “nonoptimal” codons.

Highlights

  • Despite the relative universality of the genetic code and the conservation of the translation machinery across species, synonymous codons are not used, and codon biases vary dramatically between organisms and across genes within the same genome (Hershberg and Petrov 2008; Plotkin and Kudla 2011; Novoa and Ribas de Pouplana 2012; Shabalina et al 2013)

  • Beyond GC Content, Codon Usage Bias Shows Domain-Specific Patterns The nonuniform usage of synonymous codons in a given sequence or genome can be measured as relative synonymous codon usage (RSCU), which is defined as the ratio of the observed frequency of codons to the expected frequency

  • Upon hierarchical clustering of species based on their average RSCU, we find that species do not cluster following the tree of life, but rather, based on GC content, suggesting GC content is the major determinant of codon usage bias across species, in agreement with previous works (Hershberg and Petrov 2009)

Read more

Summary

Introduction

Despite the relative universality of the genetic code and the conservation of the translation machinery across species, synonymous codons are not used, and codon biases vary dramatically between organisms and across genes within the same genome (Hershberg and Petrov 2008; Plotkin and Kudla 2011; Novoa and Ribas de Pouplana 2012; Shabalina et al 2013). Various factors can influence codon usage bias within and across genomes, including protein expression level (Gouy and Gautier 1982; Ikemura 1985), GC content (Hershberg and Petrov 2009; Palidwor et al 2010), recombination rates (Marais et al 2001), translation efficiency (Sorensen et al 1989; Tuller, Waldman, et al 2010; Qian et al 2012), mRNA structure (Kudla et al 2009), codon position (Tuller, Carmi, et al 2010), mRNA stability (Presnyak et al 2015), and gene length (Eyre-Walker 1996; Duret and Mouchiroud 1999), amongst others. Codon usage variation within genomes (intraspecies codon usage) is often attributed to selection, due to the significant positive correlation between protein expression levels and the presence of “preferred” or “optimal” codons (Sharp et al 1986; Duret and Mouchiroud 1999). GCrich organisms tend to favor GC-rich codons whereas AT-rich organisms are enriched in AT-rich codons (Hershberg and Petrov 2009)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call