Abstract

Protein flexibility and solvation pose major challenges to docking algorithms and scoring functions. One established strategy for addressing these challenges is to use multiple protein conformations for docking (all‐against‐all ensemble docking). Recent studies have shown that the performance of ensemble docking can be improved by selecting the most relevant protein structures for docking. In search for a robust approach to protein structure selection, we have come up with an integrated mAchine Learning AnD DockINg approach (ALADDIN). ALADDIN employs a battery of random forest classifiers to select, individually for each compound of interest, from an ensemble of protein structures, the single most suitable protein structure for docking. ALADDIN outperformed the best single‐structure docking runs, ensemble docking and a similarity‐based docking approach on three out of four investigated targets, with up to 0.15, 0.11 and 0.16 higher area under the receiver operating characteristic curve (AUC) values, respectively. Only in the case of cytochrome P450 3A4, ALADDIN, like any of the other tested approaches, failed to obtain decent performance. ALADDIN can be particularly useful for structure‐based virtual screening of malleable proteins, including kinases, some viral enzymes and anti‐targets.

Highlights

  • Ligand docking is one of the most widely applied computational approaches in drug discovery.[1,2,3] Modern docking algorithms and scoring functions are powerful tools for predicting the likely binding pose of small molecules.[4]

  • approach for mAchine Learning AnD DockINg (ALADDIN) was tested on four representative human proteins of pharmaceutical relevance for which we retrieved sets of known ligands and decoys from the DUD-E: * VEGFR2, a principle responder to vascular endothelial growth factor signal and the major signal transducer for angiogenesis.[41,42] * p38α MAPK, which mediates cellular responses to injurious stress and immune signaling and regulates tumorigenesis.[43,44] * GCR, a nuclear receptor controlling the transcription within networks comprising thousands of genes and dominating in various fields of development, metabolism, stress response, inflammation and other organismal processes.[45] * CYP3A4, a member of cytochrome P450 family which metabolizes a large variety of xenobiotics and endogenous compounds.[46]

  • In order to set reference points for comparing the performance of different docking strategies, we explored the range of AUC values obtained by single-structure docking for the identical sets of protein structures that will be used for evaluating ensemble docking and ALADDIN

Read more

Summary

Introduction

Ligand docking is one of the most widely applied computational approaches in drug discovery.[1,2,3] Modern docking algorithms and scoring functions are powerful tools for predicting the likely binding pose of small molecules.[4]. One of the most widely applied strategies to mitigate this problem is to generate ensembles of representative (and generally static) target structures for docking.[13,14] In this so-called (allagainst-all) ensemble docking approach, ligands of interest are individually docked against each of the ensemble structures, and the predictions assessed according to userdefined scoring schemes.[15,16]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call