Heterologous protein production is a key technology for biotechnological, health sciences and many other research fields. Various approaches have been developed for its optimization, but the research emphasis has been on optimization of protein yield rather than protein quality. In this study, we have established a workflow for synthetic gene optimization for heterologous protein expression that combines bioinformatics, laboratory experiments, mass spectrometry and statistical analysis. Two gene primary structure analysis platforms, Anaconda and EuGene, and multivariate optimization methods were employed to re-design the Plasmodium falciparum lysyl-tRNA synthetase gene for optimal expression in Escherichia coli. Synthetic genes were expressed from common vectors, and amino acid mis-incorporations in the expressed proteins were detected and quantified using mass spectrometry. The association between the identified amino acid mis-incorporations and 23 gene variables was then analysed. The synthetic genes yielded significantly higher levels of protein relative to the wild-type gene, but 71 amino acid mis-incorporation sites were observed along the whole protein and across the synthetic genes that were statistically associated with specific codons and protein secondary structures. The optimization method that led to production of the most accurate protein was based on a multivariate approach that combined variables that are known to influence mRNA translation.
Read full abstract