Abstract

Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design.

Highlights

  • Solubility is a fundamental protein property, that can give useful insights into the protein’s function or potential usability, for example, in foams, emulsions, and gels [1], and therapeutics applications such as drug delivery [2,3]

  • By analyzing the DSResSol model results, we found that glutamine, serine, and aspartic acid are key amino acids that favorably contribute to protein solubility

  • Squeeze excitation Residual network Solubility predictor (DSResSol), that outperforms all available bioinformatic tools for solubility prediction when the performance is assessed by different evaluation metrics such as accuracy and Matthew’s correlation coefficient (MCC)

Read more

Summary

Introduction

Solubility is a fundamental protein property, that can give useful insights into the protein’s function or potential usability, for example, in foams, emulsions, and gels [1], and therapeutics applications such as drug delivery [2,3]. There exist certain refolding methods that utilize weak promoters and fusion proteins or optimize expression conditions, e.g., by using low temperatures [4,5]. These methods cannot ensure the production of soluble proteins from a relatively small trial batch size as they are limited by production cost and time. Reliable computational approaches for discovering potentially soluble protein targets for experimental testing can help to avoid expensive experimental trial and error approaches

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.