The complexity and size of large molecular systems, such as protein-ligand complexes, pose computational challenges for accurate post-Hartree-Fock calculations. This study delivers a thorough benchmarking of the Molecules-in-Molecules (MIM) method, presenting a clear and accessible strategy for layer/theory selections in post-Hartree-Fock computations on substantial molecular systems, notably protein-ligand complexes. An approach is articulated, enabling augmented computational efficiency by strategically canceling out common subsystem energy terms between complexes and proteins within the supermolecular equation. Employing DLPNO-based post-Hartree-Fock methods in conjunction with the three-layer MIM method (MIM3), this study demonstrates the achievement of protein-ligand binding energies with remarkable accuracy (errors <1 kcal mol-1), while significantly reducing computational costs. Furthermore, noteworthy correlations between theoretically computed interaction energies and their experimental equivalents were observed, with R2 values of approximately 0.90 and 0.78 for CDK2 and BZT-ITK sets, respectively, thus validating the efficacy of the MIM method in calculating binding energies. By highlighting the crucial role of diffuse or small Pople-style basis sets in the middle layer for reducing energy errors, this work provides valuable insights and practical methodologies for interaction energy computations in large molecular complexes and opens avenues for their application across a diverse range of molecular systems.
Read full abstract