In this study, we systematically studied the energy distribution of bioactive conformations of small molecular ligands in their conformational ensembles using ANI-2X, a machine learning potential, in conjunction with one of our recently developed geometry optimization algorithms, known as a conjugate gradient with backtracking line search (CG-BS). We first evaluated the combination of these methods (ANI-2X/CG-BS) using two molecule sets. For the 231-molecule set, ab initio calculations were performed at both the ωB97X/6-31G(d) and B3LYP-D3BJ/DZVP levels for accuracy comparison, while for the 8,992-molecule set, ab initio calculations were carried out at the B3LYP-D3BJ/DZVP level. For each molecule in the two molecular sets, up to 10 conformations were generated, which diminish the influence of individual outliers on the performance evaluation. Encouraged by the performance of ANI-2x/CG-BS in these evaluations, we calculated the energy distributions using ANI-2x/CG-BS for more than 27,000 ligands in the protein data bank (PDB). Each ligand has at least one conformation bound to a biological molecule, and this ligand conformation is labeled as a bound conformation. Besides the bound conformations, up to 200 conformations were generated using OpenEye's Omega2 software (https://docs.eyesopen.com/applications/ omega/) for each conformation. We performed a statistical analysis of how the bound conformation energies are distributed in the ensembles for 17,197 PDB ligands that have their bound conformation energies within the energy ranges of the Omega2-generated conformation ensembles. We found that half of the ligands have their relative conformation energy lower than 2.91 kcal/mol for the bound conformations in comparison with the global conformations, and about 90% of the bound conformations are within 10 kcal/mol above the global conformation energies. This information is useful to guide the construction of libraries for shape-based virtual screening and to improve the docking algorithm to efficiently sample bound conformations.
Read full abstract