Abstract
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.
Highlights
Protein-coding genes exhibit distinct patterns of codon usage, known as Codon Usage Bias (CUB)
Results and Discussion scnRCA enhances the isolation of translational bias in mutationally biased genomes To validate the hypothesis that the correction for genomic biases in scnRCA should enhance its ability to identify the effects of translational selection, we conducted a comprehensive benchmarking of scnRCA using available microarray expression data for moderate- and fast-growing bacterial species with and without compositional biases (Table S3)
The performance of the scnRCA algorithm was compared directly with the original self-consistent CAI (scCAI) implementation [31], with the MILC and d indices, which use annotated ribosomal proteins to define the reference set [40,41], and with the CDC index, which computes a direct estimate of deviation from the genomic average [42]
Summary
Protein-coding genes exhibit distinct patterns of codon usage, known as Codon Usage Bias (CUB). Correlation with expression data For each species under analysis, scnRCA/scCAI was run until convergence on the reference genome sequence and all sequences for protein-coding genes tagged as ‘‘ribosomal protein’’ were pooled to create the MILC and d reference set.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.