Abstract

Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE=1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call