FFENCODER-PL: Pair Wise Energy Descriptors for Protein-Ligand Pose Selection.

Jun Pei,Lin Frank Song,Kenneth M Merz

doi:10.1021/acs.jctc.1c00503

Abstract

Scoring functions are the essential component in molecular docking methods. An accurate scoring function is expected to distinguish the native ligand pose from decoy poses. Our previous experience (Pei et al. J. Chem. Inf. Model. 2019, 59 (7), 3305-3315) proved that combining the random forest (RF) algorithm with knowledge-based potential functions can emphasize germane pair wise interactions and improve the performance of original knowledge-based potential functions on protein-ligand decoy detection. One of the most important potential function classes is the force field (FF) potential with one example being the Amber collection of FFs, which are widely available in the AMBER suite of simulation programs. However, for use in RF modeling studies, one needs pair wise energies that are hard to directly extract from Amber. To address this issue, FFENCODER-PL was constructed to calculate the pair wise energies based on the FF14SB and GAFF2 FFs in Amber. FFENCODER-PL was validated using 275 ligand and 21 protein-ligand structures. RF models were built by combining an RF classification algorithm with the pair wise energies calculated from FFENCODER-PL. CASF-2016 (Su et al. J. Chem. Inf. Model. 2019, 59, 895-913) was employed to test the performance of the resultant RF models, which outperformed 33 scoring functions on accuracy and native ranking tests. For the best decoy RMSD test, RF models give a best decoy with an RMSD of around 2 Å from the native pose after including the best decoy-decoy comparisons in the RF model. The relative importance of the RF algorithm and force field potentials was also tested with the results suggesting that both the RF algorithm and force field potentials are important and combining them is the only way to achieve high accuracy. Finally, FFENCODER-PL makes force field-based pair wise energies available for further development of machine learning-based scoring functions. The codes and data used in this paper can be found at https://github.com/JunPei000/Amber_protein_ligand_encoding.

Full Text