Abstract

BackgroundCodon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported.ResultsBased on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)2 + 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)2 + 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation.ConclusionsThe synonymous codon usage bias could be simply expressed as 1+ (p/2)log2(p/2) + ((1-p)/2)log2((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at .

Highlights

  • Codon usage bias has been widely reported to correlate with GC composition

  • We presented an analytical model to quantify the non-linear relationship between GC3 and a measurement of codon usage bias, which reveals that GC3 is the key factor driving synonymous codon usage and that this mechanism is independent of species

  • We recently developed an informatics method [13] to provide an estimate for the orderliness of synonymous codon usage (SCUO) and the amount of synonymous codon usage bias

Read more

Summary

Introduction

Codon usage bias has been widely reported to correlate with GC composition. The quantitative relationship between codon usage bias and GC composition across species has not been reported. Previous codon usage analyses showed that codon usage bias is very complicated and is associated with various biological factors, such as gene expression level [7,8,9,10], gene length [11,12,13], gene translation initiation signal [14], protein amino acid composition [6,15], protein structure [16,17], tRNA abundance [18,19,20,21], mutation frequency and patterns [22,23], and GC composition [24,25,26,27]. GC composition may be described at three levels: 1) Overall GC content. Local GC composition is defined based on (page number not for citation purposes)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.