The similarity of protein–ligand and host–guest complexes leads to the direct extension of end-point free energy techniques to virtual screening of the latter. Despite the massive application of end-point calculations in modelling and predictions of host–guest binding, it still lacks a large-scale benchmark test with sufficient sizes, sampling length and diverse coverages of chemical spaces to get rid of many disturbing factors and provide an unbiased statistical perspective of practical performance and value of end-point screening. Cucurbiturils as one of the main-stream macrocycles are quite popular in pharmaceutical research and industrial applications as drug reservoirs and carriers. In this work, we gather a large dataset of ∼150 host–guest complexes involving cucurbiturils with varying numbers of repeating units, extensively sample the conformational space in physical end states with both the single- and three-trajectory protocols, and extract the affinity estimates with a variety of MM and QM/GBSA Hamiltonians with implicit solvents. The diversity of chemical compositions, the quality of configurational sampling, and the comprehensiveness of parameter combinations are unprecedently high, enabling us to deduce conclusive pictures of the end-point performance in host–guest binding. Unlike previous reports on improved performance due to the incorporation of the QM treatment, shifting to QM/GBSA treatments deteriorates both the reproduction of absolute affinities and the prediction of affinity ranks in our large-scale end-point benchmark, regardless of the employed sampling protocol (i.e., single- and three-trajectory realization). Overall, in the end-point screening of cucurbituril host–guest complexes, we recommend sticking to the original MM/GBSA Hamiltonians in both sampling and free energy estimation. The best/recommended selection with high accuracy and robustness among all end-point protocols benchmarked in this work is the three-trajectory GAFF + GBOBC-I regime, which achieves top performances for both the reproduction of absolute affinities and ranking different host–guest pairs. Docking estimates with seven main-stream protocols have been observed as a promising tool for host–guest binding, but their ranking powers in the current large host–guest dataset does not surpass end-point tools, validating the value of post-docking end-point screening.
Read full abstract