The last few years have seen the rapid proliferation of machine learning methods to design binding proteins. Although these methods have shown large increases in experimental success rates compared to prior approaches, the majority of their predictions fail when they are experimentally tested. It is evident that computational methods still struggle to distinguish the features of real protein binding interfaces from false predictions. Short molecular dynamics simulations of 20 antibody-protein complexes were conducted to identify features of interactions that should occur in binding interfaces. Intermolecular salt bridges, hydrogen bonds, and hydrophobic interactions were evaluated for their persistences, energies, and stabilities during the simulations. It was found that only the hydrogen bonds where both residues are stabilized in the bound complex are expected to persist and meaningfully contribute to binding between the proteins. In contrast, stabilization was not a requirement for salt bridges and hydrophobic interactions to persist. Still, interactions where both residues are stabilized in the bound complex persist significantly longer and have significantly stronger energies than other interactions. Two hundred and twenty real antibody-protein complexes and 8194 decoy complexes were used to train and test a random forest classifier using the features of expected persistent interactions identified in this study and the macromolecular features of interaction energy (IE), buried surface area (BSA), IE/BSA, and shape complementarity. It was compared to a classifier trained only on the expected persistent interaction features and another trained only on the macromolecular features. Inclusion of the expected persistent interaction features reduced the false positive rate of the classifier by two- to five-fold across a range of true positive classification rates.
Read full abstract