Abstract

Pichia pastoris is commonly used for the production of recombinant proteins due to its preferential secretion of recombinant proteins, resulting in lower production costs and increased yields of target proteins. However, not all recombinant proteins can be successfully secreted in P. pastoris. A computational method that predicts the likelihood of a protein being secreted into the supernatant would be of considerable value; however, to the best of our knowledge, no such tool has yet been developed. We present a machine-learning approach called Presep to assess the likelihood of a recombinant protein being secreted by P. pastoris based on its pseudo amino acid composition (PseAA). Using a 20-fold cross validation, Presep demonstrated a high degree of accuracy, with Matthews correlation coefficient (MCC) and overall accuracy (Q2) scores of 0.78 and 95%, respectively. Computational results were validated experimentally, with six β-galactosidase genes expressed in P. pastoris strain GS115 to verify Presep model predictions. A strong correlation (R2 = 0.967) was observed between Presep prediction secretion propensity and the experimental secretion percentage. Together, these results demonstrate the ability of the Presep model for predicting the secretion propensity of P. pastoris for a given protein. This model may serve as a valuable tool for determining the utility of P. pastoris as a host organism prior to initiating biological experiments. The Presep prediction tool can be freely downloaded at http://www.mobioinfor.cn/Presep.

Highlights

  • Pichia pastoris is one of the most frequently used organisms for the heterologous production of recombinant proteins

  • We propose the Presep method (Predicting the propensity of a protein being secreted into the supernatant when expressed in P. pastoris) to identify the secretion state of proteins in P. pastoris based on the ensemble learning method random forests (RF)

  • Training and Validation To train the models used for Presep, we constructed the Secreprot dataset containing 1093 proteins experimentally validated in P. pastoris

Read more

Summary

Introduction

Pichia pastoris is one of the most frequently used organisms for the heterologous production of recombinant proteins. A method that predicts the likelihood of a protein being secreted into the supernatant before being expressed in P. pastoris would be of considerable value; to the best of our knowledge, no such tool has yet been developed.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.