Abstract
Consensus-scoring methods are commonly used with molecular docking in virtual screening campaigns to filter potential ligands for a protein target. Traditional consensus methods combine results from different docking programs by averaging the score or rank of each molecule obtained from individual programs. Unfortunately, these methods fail if one of the docking programs has poor performance, which is likely to occur due to training-set dependencies and scoring-function parameterization. In this work, we introduce a novel consensus method that overcomes these limitations. We combine the results from individual docking programs using a sum of exponential distributions as a function of the molecule rank for each program. We test the method over several benchmark systems using individual and ensembles of target structures from diverse protein families with challenging decoy/ligand datasets. The results demonstrate that the novel method outperforms the best traditional consensus strategies over a wide range of systems. Moreover, because the novel method is based on the rank rather than the score, it is independent of the score units, scales and offsets, which can hinder the combination of results from different structures or programs. Our method is simple and robust, providing a theoretical basis not only for molecular docking but also for any consensus strategy in general.
Highlights
Experimental methods for drug discovery involve high-throughput screening techniques, in which large numbers of compounds are experimentally tested and their activity is evaluated towards a biological target[1]
It has been found that the effectiveness of each program is system-dependent, mainly because the search algorithms used to find the correct poses and scoring functions depend on the training sets and parameterization protocols
By performing docking-based virtual screening studies on several molecular systems and using seven scoring functions, in this work, we show that some consensus scoring methodologies avoid the system-bias effects that are typically found in individual docking programs, such as parameter training or overfitting
Summary
Experimental methods for drug discovery involve high-throughput screening techniques, in which large numbers of compounds are experimentally tested and their activity is evaluated towards a biological target[1]. Computer-aided methods have emerged as a way to decrease the time and economic costs of the experimental trials by evaluating large datasets of molecules in virtual screening campaigns[2,3,4,5] With these methods it is possible to filter compounds that are potentially active towards a protein target from large datasets. Alternative rank-based consensus strategies, such as ‘rank-by-rank’ and ‘rank-by-vote’[40], or score-based strategies, such as ‘average of auto-scaled scores’36, ‘Z-score’[41], and ‘rank-by-number’[40], have become popular in recent years[38,41,42] (see the Methods for a detailed description) These consensus methodologies have been shown to produce better results than individual scoring functions/docking programs, their implementation may be subject to biases and errors within data management[40], as we show later. This work focuses on developing a method to extract only the best rank for each molecule from a consensus perspective
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.