The FoldX force field was originally validated with a database of 1000 mutants at a time when there were few high-resolution structures. Here we have manually curated a database of 5556 mutants affecting protein stability, resulting in 2484 highly confident mutations denominated FoldX Stability Dataset (FSD), represented in non-redundant X-ray structures with less than 2.5 Å resolution, not involving duplicates, metals or prosthetic groups. Using this database, we have created a new version of the FoldX force field by introducing Pi stacking, pH dependency for all charged residues, improving aromatic-aromatic interactions, modifying the Ncap contribution and α-helix dipole, recalibrating the side chain entropy of Methionine, adjusting the H-bond parameters, and modifying the solvation contribution of Tryptophan and others. These changes have led to significant improvements for the prediction of specific mutants involving the above residues/interactions and a statistically significant increase of FoldX predictions, as well as for the majority of the 20 aa. Removing all training sets data from FSD (VFSD dataset), resulted in improved predictions from R = 0.693 (RMSE = 1.277 kcal/mol) to R = 0.706 (RMSE = 1.252 kcal/mol) when compared with the previously released version. FoldX achieves 95% accuracy considering an error of ± 0.85 kcal/mol in prediction, and an AUC = 0.78, for the VFSD, predicting the sign of the energy change upon mutation. FoldX versions 4.1 & 5.1 are freely available for academics at https://foldxsuite.crg.eu/. Supplementary data are available at Bioinformatics online.
Read full abstract