Efficient conformational sampling and weak scoring in docking programs? Strategy of the wisdom of crowds

Ludovic Chaput,Liliane Mouawad

doi:10.1186/s13321-017-0227-x

Abstract

BackgroundIn drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein–ligand interactions. In the absence of experimental information, docking programs are necessary for ligand positioning, and the choice of a reliable program is essential for the success of such an optimization. The performances of four popular docking programs, Gold, Glide, Surflex and FlexX, were investigated using 100 crystal structures of complexes taken from the Directory of Useful Decoys-Enhanced database.ResultsThe ligand conformational sampling was rather efficient, with a correct pose found for a maximum of 84 complexes, obtained by Surflex. However, the ranking of the correct poses was not as efficient, with a maximum of 68 top-rank or 75 top-4 rank correct poses given by Glidescore. No relationship was found between either the sampling or the scoring performance of the four programs and the properties of either the targets or the small molecules, except for the number of ligand rotatable bonds. As well, no exploitable relationship was found between each program performance in docking and in virtual screening; a wrong top-rank pose may obtain a good score that allows it to be ranked among the most active compounds and vice versa. Also, to improve the results of docking, the strengths of the programs were combined either by using a rescoring procedure or the United Subset Consensus (USC). Oddly, positioning with Surflex and rescoring with Glidescore did not improve the results. However, USC based on docking allowed us to obtain a correct pose in the top-4 rank for 87 complexes. Finally, nine complexes were scrutinized, because a correct pose was found by at least one program but poorly ranked by all four programs. Contrarily to what was expected, except for one case, this was not due to weaknesses of the scoring functions.ConclusionsWe conclude that the scoring functions should be improved to detect the correct poses, but sometimes their failure may be due to other varied considerations. To increase the chances of success, we recommend to use several programs and combine their results.Graphical abstractSummary of the results obtained by semi-rigid docking of crystallographic ligands. The docking was done on 100 protein-ligand X-ray structures, taken from the DUD-E database, and using four programs, Glide, Gold, Surflex and FlexX. Based on the docking results, we applied our United Subset Consensus method (USC), for which only the top4-rank poses are relevant. The number of complexes for which the best pose is correct, is represented by the gray boxes, the blue and red boxes correspond to the number of complexes with a correct pose ranked as the top 1 or within the top 4. A pose is considered correct when its root-mean-square deviation from the crystal structure is less than 2 Å

Highlights

In drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein–ligand interactions
Aofb and casp3, are covalently bound to their crystal ligand, so they were excluded from the dataset because they cannot be properly handled by the docking programs in their standard usage
In this study, we present the results of the evaluation of 4 docking programs, Glide, Gold, Surflex and FlexX

Summary

Introduction

An efficient structure-based optimization of a ligand needs the precise knowledge of the protein–ligand interactions. Kellenberger et al [24] evaluated the ability of six docking programs to recover the X-ray pose for 100 protein–ligand complexes They reported that Gold, Glide, Surflex and FlexX were the most accurate programs and that generally their performance decreased with the size of the binding site, the size of the ligand and the number of its rotatable bonds. Wang et al [25] have reported from a comprehensive evaluation of ten docking programs that the correlation between the scores of some programs (Gold, Glide and Surflex, among others) and the binding affinities may be high for certain protein families (up to 0.7) They concluded that these programs may be more suitable for these families. This assertion seems fragile in the absence of information concerning the degree of identity between the proteins of the same family and the similarities between their ligands

Methods

Results

Conclusion