Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Sara Mohammadi,Mitra Ashouri,Zahra Narimani,Mohammad Hossein Karimi‐Jafari,Rohoullah Firouzi

doi:10.1038/s41598-021-04448-5

Abstract

Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies.

Highlights

Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck
Two segments corresponding to residues 35–47 and 147–164 have some missing residues in almost all structures in the absence of cyclin, while a long segment around 220–252 has some missing residues in many cyclin-dependent kinase 2 (CDK2) structures complexed with cyclin
The large size of the receptor ensemble affects the performance of the procedure; the optimal size of the receptor ensemble is an open question from both the computational cost and accuracy points of view

Summary

Introduction

Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. In the case of healing treatments, structure-based drug design attempts to design candidate drugs according to their interaction with the three-dimensional structure of the target p roteins[1,2,3] In this direction, docking technologies have been improved considerably with two main goals: accurate prediction of the geometry of binding (pose prediction) and reliable scoring of different binding geometries (affinity prediction)[4,5,6]. The dynamics of proteins over their conformational ensembles enable them to harness thermodynamic fluctuations for specific recognition of their targets, including small molecules[15] This diversity adds an additional search dimension in docking procedures, and as a result, different approaches have been designed to incorporate receptor flexibility through ensemble d ocking[16,17]. The authors reported steady gains in the performance of these two methods as the training set size and type and number of features were increased[37]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jan 10, 2022
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning.
Joel Ricci-Lopez ... Carlos A Brizuela
Journal of Chemical Information and Modeling | VOL. 61
Joel Ricci-Lopez, et. al.Joel Ricci-Lopez ... Carlos A Brizuela
15 Oct 2021
Journal of Chemical Information and Modeling | VOL. 61

Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles.
Robert V Swift ... Eric S Li
Journal of Chemical Information and Modeling | VOL. 56
Robert V Swift, et. al.Robert V Swift ... Eric S Li
03 May 2016
Journal of Chemical Information and Modeling | VOL. 56

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.
...
Journal of chemical information and modeling | VOL. 60
, et. al. ...
16 Dec 2020
Journal of chemical information and modeling | VOL. 60

IoT Botnet Attack Detection and Comparison of Various Ensembler Learning Techniques/Models using Machine Learning
Peerzada Saima Rafiq Masoodi ... Sharad
-
Peerzada Saima Rafiq Masoodi, et. al.Peerzada Saima Rafiq Masoodi ... Sharad
17 Dec 2021
17 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports