Abstract

The genetic code is necessarily degenerate with 64 possible nucleotide triplets being translated into 20 amino acids. Eighteen out of the 20 amino acids are encoded by multiple synonymous codons. While synonymous codons are clearly equivalent in terms of the information they carry, it is now well established that they are used in a biased fashion. There is currently no consensus as to the origin of this bias. Drawing on ideas from stochastic thermodynamics we derive from first principles a mathematical model describing the statistics of codon usage bias. We show that the model accurately describes the distribution of codon usage bias of genomes in the fungal and bacterial kingdoms. Based on it, we derive a new computational measure of codon usage bias—the distance capturing two aspects of codon usage bias: (i) differences in the genome-wide frequency of codons and (ii) apparent non-random distributions of codons across mRNAs. By means of large scale computational analysis of over 900 species across two kingdoms of life, we demonstrate that our measure provides novel biological insights. Specifically, we show that while codon usage bias is clearly based on heritable traits and closely related species show similar degrees of bias, there is considerable variation in the magnitude of within taxonomic classes suggesting that the contribution of sequence-level selection to codon bias varies substantially within relatively confined taxonomic groups. Interestingly, commonly used model organisms are near the median for values of for their taxonomic class, suggesting that they may not be good representative models for species with more extreme , which comprise organisms of medical and agricultural interest. We also demonstrate that amino acid specific patterns of codon usage are themselves quite variable between branches of the tree of life, and that some of this variability correlates with organismal tRNA content.

Highlights

  • Codon usage bias (CUB), the preferential use of some types of codons over others encoding the same amino acid during protein synthesis, is an empirically well established phenomenon

  • Existing model-based approaches include the ‘Effective Number of Codons’ (ENc) which essentially performs a statistical test against the null hypothesis that codon usage is solely governed by genomic GC content [4], models based on the combined forces of mutation bias and selection for minimal energy usage during translation [5,6], and models based around the forces of mutation, selection and genetic drift in populations [7,8]

  • For the purpose of this article, we will exclusively focus on synonymous mutations, whereby a codon a encoding an amino acid gets exchanged for a codon a0 that encodes the same amino acid

Read more

Summary

Introduction

Codon usage bias (CUB), the preferential use of some types of codons over others encoding the same amino acid during protein synthesis, is an empirically well established phenomenon. Existing approaches to studying codon usage use a number of measures for the frequency of individual codons relative to particular, ‘optimal’ reference codons Such measures include the frequency of optimal codons Fopt [1], the codon bias index CBI [2], and the codon adaptation index CAI [3]. Current methods cannot be applied to many interesting organisms, such as organisms with extreme lifestyles like parasites [21,22,23] or thermophiles In this contribution, we shall propose a fresh approach to quantifying CUB that does not rely on a choice of comparison sets, nor does it assume a particular mechanistic model of codon usage bias. This analysis will show that (i) D captures relevant biology, (ii) SLS strongly contributes to shaping overall codon usage bias in most organisms, (iii) amino acid specific patterns of codon usage are variable between branches of the tree of life, and (iv) that D varies with the ecological niche of the organism

Codon selection as a random walk
Deriving the full model
Genomic data bear the signature of SLS
The full model fits the fungal data better than the binomial model
Genome-wide analysis of fungal genomes using the distance measure
D reveals amino acid-specific patterns of codon selection pressure
F E amino acid amino acid count multiple expression dependent biases
Discussion
The dataset
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.