RASPD+: Fast Protein-Ligand Binding Free Energy Prediction Using Simplified Physicochemical Features.

Stefan Holderbach,B Jayaram,Lukas Adam,Goutam Mukherjee,Rebecca C Wade

doi:10.3389/fmolb.2020.601065

Abstract

The virtual screening of large numbers of compounds against target protein binding sites has become an integral component of drug discovery workflows. This screening is often done by computationally docking ligands into a protein binding site of interest, but this has the drawback of a large number of poses that must be evaluated to obtain accurate estimates of protein-ligand binding affinity. We here introduce a fast pre-filtering method for ligand prioritization that is based on a set of machine learning models and uses simple pose-invariant physicochemical descriptors of the ligands and the protein binding pocket. Our method, Rapid Screening with Physicochemical Descriptors + machine learning (RASPD+), is trained on PDBbind data and achieves a regression performance that is better than that of the original RASPD method and traditional scoring functions on a range of different test sets without the need for generating ligand poses. Additionally, we use RASPD+ to identify molecular features important for binding affinity and assess the ability of RASPD+ to enrich active molecules from decoys.

Highlights

Virtual screening to assess in silico the binding of candidate ligands to a target protein is a key component of structure-based drug design procedures (Torres et al, 2019; Wang et al, 2020)
We demonstrate the capabilities of RASPD+ for binding free energy regression and compare its performance to established scoring functions
We considered linear regression (LR), as it was used in the previous RASPD approach (Mukherjee and Jayaram, 2013), support vector regression (Drucker et al, 1997) (SVR), kNearest Neighbors, simple deep neural networks (DNN), random forests (Breiman, 2001) (RF), and a variant of the former, extremely random forests (Geurts et al, 2006)

Summary

Introduction

Virtual screening to assess in silico the binding of candidate ligands to a target protein is a key component of structure-based drug design procedures (Torres et al, 2019; Wang et al, 2020). A scoring function is evaluated to approximate the binding-free energy, and this is used to rank the binding poses and different candidate ligands for their ability to bind to the target protein. Docking procedures are frequently supplemented by methods employing molecular dynamics simulations with the aim of computing more accurate binding affinities. Both docking and molecular dynamics simulations often fail to provide predictions of binding free energy at the level of accuracy desired. They are demanding in terms of computational effort and expertise (Willems et al, 2020).

Methods

Results

Conclusion