Abstract
There is strong interest in accurate methods for predicting changes in protein stability resulting from amino acid mutations to the protein sequence. Recombinant proteins must often be stabilized to be used as therapeutics or reagents, and destabilizing mutations are implicated in a variety of diseases. Due to increased data availability and improved modeling techniques, recent studies have shown advancements in predicting changes in protein stability when a single-point mutation is made. Less focus has been directed toward predicting changes in protein stability when there are two or more mutations. Here, we analyze the largest available dataset of double point mutation stability and benchmark several widely used protein stability models on this and other datasets. We find that additive models of protein stability perform surprisingly well on this task, achieving similar performance to comparable non-additive predictors according to most metrics. Accordingly, we find that neither artificial intelligence-based nor physics-based protein stability models consistently capture epistatic interactions between single mutations. We observe one notable deviation from this trend, which is that epistasis-aware models provide marginally better predictions than additive models on stabilizing double point mutations. We develop an extension of the ThermoMPNN framework for double mutant modeling, as well as a novel data augmentation scheme, which mitigates some of the limitations in currently available datasets. Collectively, our findings indicate that current protein stability models fail to capture the nuanced epistatic interactions between concurrent mutations due to several factors, including training dataset limitations and insufficient model sensitivity.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have