Abstract

Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.

Highlights

  • Protein-coding genes exhibit distinct patterns of codon usage, known as Codon Usage Bias (CUB)

  • Results and Discussion scnRCA enhances the isolation of translational bias in mutationally biased genomes To validate the hypothesis that the correction for genomic biases in scnRCA should enhance its ability to identify the effects of translational selection, we conducted a comprehensive benchmarking of scnRCA using available microarray expression data for moderate- and fast-growing bacterial species with and without compositional biases (Table S3)

  • The performance of the scnRCA algorithm was compared directly with the original self-consistent CAI (scCAI) implementation [31], with the MILC and d indices, which use annotated ribosomal proteins to define the reference set [40,41], and with the CDC index, which computes a direct estimate of deviation from the genomic average [42]

Read more

Summary

Introduction

Protein-coding genes exhibit distinct patterns of codon usage, known as Codon Usage Bias (CUB). Correlation with expression data For each species under analysis, scnRCA/scCAI was run until convergence on the reference genome sequence and all sequences for protein-coding genes tagged as ‘‘ribosomal protein’’ were pooled to create the MILC and d reference set.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call