Abstract

BackgroundCustom genes have become a common resource in recombinant biology over the last 20 years due to the plummeting cost of DNA synthesis. These genes are often “optimized” to non-native sequences for overexpression in a non-native host by substituting synonymous codons within the coding DNA sequence (CDS). A handful of studies have compared native and optimized CDSs, reporting different levels of soluble product due to the accumulation of misfolded aggregates, variable activity of enzymes, and (at least one report of) a change in substrate specificity. No study, to the best of our knowledge, has performed a practical comparison of CDSs generated from different codon optimization algorithms or reported the corresponding protein yields.ResultsIn our efforts to understand what factors constitute an optimized CDS, we identified that there is little consensus among codon-optimization algorithms, a roughly equivalent chance that an algorithm-optimized CDS will increase or diminish recombinant yields as compared to the native DNA, a near ubiquitous use of a codon database that was last updated in 2007, and a high variability of output CDSs by some algorithms. We present a case study, using KRas4B, to demonstrate that a median codon frequency may be a better predictor of soluble yields than the more commonly utilized CAI metric.ConclusionsWe present a method for visualizing, analyzing, and comparing algorithm-optimized DNA sequences for recombinant protein expression. We encourage researchers to consider if DNA optimization is right for their experiments, and work towards improving the reproducibility of published recombinant work by publishing non-native CDSs.

Highlights

  • Custom genes have become a common resource in recombinant biology over the last 20 years due to the plummeting cost of DNA synthesis

  • The popularity of E. coli is reflected in the RCSB (Research Collaboratory for Structural Bioinformatics), with 73 ± 3% of human proteins being made in E. coli since 2000 (Fig. 1), and in the development of several new commercial expression strains over the last decade [2]

  • Sharp and Li originally proposed a system where synonymous codons were normalized for each amino acid and used to calculate the relative synonymous codon usage (RCSU) based on their codon usage to one another within a gene [8]

Read more

Summary

Introduction

Custom genes have become a common resource in recombinant biology over the last 20 years due to the plummeting cost of DNA synthesis. These genes are often “optimized” to non-native sequences for overexpression in a non-native host by substituting synonymous codons within the coding DNA sequence (CDS). Many protein targets are generated through heterologous expression of recombinant proteins. The biological principles for recombinant protein expression are well established; the ability to distinguish protein targets that express well from those that express poorly is still considered a “black box” process that often requires screening many conditions to obtain a soluble product.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.