Abstract

Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation.

Highlights

  • Protein stability is one of the most important factors that characterize protein function, activity, and regulation [1]

  • The development of computational methods to accurately predict the impacts of amino acid substitutions on protein stability is of paramount importance for the field of protein design and understanding the roles of missense mutations in disease

  • In addition to Random Forest, we tried two other popular learning algorithms of Support Vector Machine (SVM) and eXtreme Gradient Boosting (XGBoost), and the results shown in the S3 Table indicate that the random forest regression model presents the best performance

Read more

Summary

Introduction

Protein stability is one of the most important factors that characterize protein function, activity, and regulation [1]. Several studies have shown that the mutations are deleterious due to decreasing or enhancing the stability of the corresponding protein [10,11,12,13,14,15]. To quantify the effects on protein stability requires estimating the changes in folding/unfolding Gibbs free energy induced by mutations. Experimental measurements of protein stability changes are laborious and appropriate only for proteins that can be purified [16]. The computational prediction is urgently required, which would help the prioritization of potentially functionally important variants and become vital to many fields, such as medical applications [17] and protein design [18]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call