Abstract

The study proposes a novel consensus strategy based on linear combinations of different docking scores to be used in the evaluation of virtual screening campaigns. The consensus models are generated by applying the recently proposed Enrichment Factor Optimization (EFO) method, which develops the linear equations by exhaustively combining the available docking scores and by optimizing the resulting enrichment factors. The performances of such a consensus strategy were evaluated by simulating the entire Directory of Useful Decoys (DUD datasets). In detail, the poses were initially generated by the PLANTS docking program and then rescored by ReScore+ with and without the minimization of the complexes. The so calculated scores were then used to generate the mentioned consensus models including two or three different scoring functions. The reliability of the generated models was assessed by a per target validation as performed by default by the EFO approach. The encouraging performances of the here proposed consensus strategy are emphasized by the average increase of the 17% in the Top 1% enrichment factor (EF) values when comparing the single best score with the linear combination of three scores. Specifically, kinases offer a truly convincing demonstration of the efficacy of the here proposed consensus strategy since their Top 1% EF average ranges from 6.4 when using the single best performing primary score to 23.5 when linearly combining scoring functions. The beneficial effects of this consensus approach are clearly noticeable even when considering the entire DUD datasets as evidenced by the area under the curve (AUC) averages revealing a 14% increase when combining three scores. The reached AUC values compare very well with those reported in literature by an extended set of recent benchmarking studies and the three-variable models afford the highest AUC average.

Highlights

  • Virtual screening (VS) involves different computational approaches aimed to identify from among huge molecular databases optimized sets of compounds which have the potential to bind a given biological target and which will undergo high throughput screenings (HTS) in order to identify novel hit compounds [1]

  • And the analyses will be subdivided into three parts which involve: (a) the primary Piecewise Linear Potential score (PLP) and its normalized values as directly computed by PLANTS; (b) the various scoring functions as computed by ReScore+ without post-docking minimization and (c) after post-docking minimization

  • The study describes the exploitation of linear combinations of more than one docking score as a consensus strategy to enhance the reliability of docking simulations in virtual screening campaigns

Read more

Summary

Introduction

Virtual screening (VS) involves different computational approaches aimed to identify from among huge molecular databases optimized sets of compounds which have the potential to bind a given biological target and which will undergo high throughput screenings (HTS) in order to identify novel hit compounds [1]. Even though docking results are clearly influenced by the applied search algorithms, scoring functions represent the most critical factor in determining the overall reliability of docking simulations [12] On these grounds, a convenient strategy to combine more docking procedures while limiting the computational cost can involve rescoring calculations in which the ligand poses are initially generated by using a single and reasonably satisfactory docking program and utilized to calculate an extended set of docking scores among which the best performing ones are suitably selected and/or combined [13]. The present study explores the possibility of applying to VS analyses the recently proposed classification algorithm which generates linear combinations of docking scores as selected by the Enrichment Factor Optimization (EFO) algorithm [17] Such an approach, which was developed to conveniently classify unbalanced datasets, should find successful applications when analyzing VS campaigns which involves extremely unbalanced databases. The predictive power of the generated models with two or three variables was assessed by a per target validation in which each DUD dataset was repeatedly subdivided into training and test sets and the models were evaluated and selected by considering their average performances when applied to test sets

Single Variable Models
Two-Variable Consensus Models
Three-Variable Consensus Models
Comparison with Already Published Studies
Generation and Validation of Predictive Models
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.