Abstract

BackgroundOver the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.ResultsThis paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end.ConclusionsThis study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

Highlights

  • In biotechnology, production of recombinant proteins is a crucial process in both biopharmaceutical industries and scientific research

  • Heterologous expression is often afflicted with low levels of production and insoluble recombinant proteins forming inclusion bodies

  • Afterwards, the best models based on the obtained accuracies are introduced

Read more

Summary

Introduction

Production of recombinant proteins is a crucial process in both biopharmaceutical industries and scientific research. Even though logical strategies of genetic engineering are well established, such as strong promoters and codon optimization, protein overexpression is often, still an art. There is no generic solution available to enhance heterologous overexpression. Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Logical strategies of genetic engineering have been established, protein overexpression is still an art. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods

Objectives
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call