Abstract

Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary. The existing optimization methods are based on biological indexes. In this paper, we propose a novel codon optimization method based on deep learning. First, we introduce the concept of codon boxes, via which DNA sequences can be recoded into codon box sequences while ignoring the order of bases. Then, the problem of codon optimization can be converted to sequence annotation of corresponding amino acids with codon boxes. The codon optimization models for Escherichia Coli were trained by the Bidirectional Long-Short-Term Memory Conditional Random Field. Theoretically, deep learning is a good method to obtain the distribution characteristics of DNA. In addition to the comparison of the codon adaptation index, protein expression experiments for plasmodium falciparum candidate vaccine and polymerase acidic protein were implemented for comparison with the original sequences and the optimized sequences from Genewiz and ThermoFisher. The results show that our method for enhancing protein expression is efficient and competitive.

Highlights

  • Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary

  • The frequency of codons in a DNA sequence is positively correlated with the corresponding tRNA in a species, and the tRNA concentration determines the number of amino acids available for protein translation extension, which in turn affects the efficiency of protein ­synthesis[5,6]

  • Because the codon adaptation index (CAI) is a factor that affects protein expression, to further validate the rationality of our codon optimization method, the FALVAC-1 protein (FALVAC-1 was constructed as a multivalent plasmodium falciparum vaccine antigen and expressed in E. coli) and PTP4A3 protein were expressed in E. coli, and their expression levels were analyzed by western blot analysis

Read more

Summary

Introduction

Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary. A strategy is proposed to adjust the original codon sequence to match the natural distribution of the host ­codons[13,17,18,19], the goal of which is to preserve the slow translation regions that are important for protein ­folding[9,10,20]. Thermofisher.com) and Genewiz (www.genewiz.com), whose methods are based on the aforementioned strategies and empirical indexes As a consequence, their indexes for codon optimization mainly include the codon adaptation index (CAI)[21], the frequency of relative synonymous codon u­ sage[22], the codon bias i­ndex[23], optimal codon ­usage[7], and effective codon n­ umber[24]. The CAI is the primary index used to predict gene expression level because it indicates the extent to which the coding sequence represents the usage of codons in an ­organism[25]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call