Rapid, accurate, precise and reproducible ligand-protein binding free energy prediction.

Shunzhou Wan,Peter V Coveney,Stefan J Zasada,Agastya P Bhati

doi:10.1098/rsfs.2020.0007

Abstract

A central quantity of interest in molecular biology and medicine is the free energy of binding of a molecule to a target biomacromolecule. Until recently, the accurate prediction of binding affinity had been widely regarded as out of reach of theoretical methods owing to the lack of reproducibility of the available methods, not to mention their complexity, computational cost and time-consuming procedures. The lack of reproducibility stems primarily from the chaotic nature of classical molecular dynamics (MD) and the associated extreme sensitivity of trajectories to their initial conditions. Here, we review computational approaches for both relative and absolute binding free energy calculations, and illustrate their application to a diverse set of ligands bound to a range of proteins with immediate relevance in a number of medical domains. We focus on ensemble-based methods which are essential in order to compute statistically robust results, including two we have recently developed, namely thermodynamic integration with enhanced sampling and enhanced sampling of MD with an approximation of continuum solvent. Together, these form a set of rapid, accurate, precise and reproducible free energy methods. They can be used in real-world problems such as hit-to-lead and lead optimization stages in drug discovery, and in personalized medicine. These applications show that individual binding affinities equipped with uncertainty quantification may be computed in a few hours on a massive scale given access to suitable high-end computing resources and workflow automation. A high level of accuracy can be achieved using these approaches.

Highlights

The use of computer models and simulations to understand natural systems is widespread, encompassing many diverse disciplines in academia as well as industry
We focus on the convergence, reproducibility and reliability of observable properties obtained from molecular dynamics (MD) simulations
(i) ‘Informatics’ based approaches which are usually the output of docking studies in combination with socalled ‘machine learning’ [84,85,86,87]; (ii) linear interaction energy (LIE) methods [88]; (iii) molecular mechanics Poisson–Boltzmann surface area (MMPBSA) and molecular mechanics generalized Born surface area (MMGBSA) methods [89] based on invoking a continuum approximation for the aqueous solvent to approximate, e.g. electrostatic interactions following all-atom MD simulations; and (iv) alchemical methods including thermodynamic integration (TI) and free energy perturbation (FEP)

Summary

Introduction

The use of computer models and simulations to understand natural systems is widespread, encompassing many diverse disciplines in academia as well as industry. Beyond the provision of qualitative insight, as our understanding increases one would hope to use these methods to quantitatively predict the outcome of experiments prior to, and even instead of, performing them [1,2,3]. In this way, computational techniques should reduce time and cost in industrial processes like the discovery of drugs and advanced materials, which take more than 10 years and $2.6 billion for the former [4], and 20 years and perhaps $10 billion for the latter. The relentless enhancement in the performance of highend computers is another key factor accounting for the increasing adoption of computer-based methods in science over recent decades

Objectives

Methods

Conclusion