Abstract
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.
Highlights
Binding affinity prediction continues to be a challenge for computer-aided drug design, especially in the case where there is no high-resolution experimental structure of the target of interest
We introduce a new machine-learning method for induction of models from structure-activity data: Quantitative Surface-field Analysis (QuanSA)
We present details of the QuanSA algorithms and results on multiple data sets, including two with particular significance from a QSAR benchmarking perspective, eight from a validation report for free-energy perturbation, four from a recent benchmark where QuanSA pocket-fields were applied to extensive ChEMBL data, and one case of particular pharmaceutical interest where model refinement was explored
Summary
Binding affinity prediction continues to be a challenge for computer-aided drug design, especially in the case where there is no high-resolution experimental structure of the target of interest. The final cliques along with their associated pose pools are sorted according to score, with each forming a possible alternative starting point for the QuanSA learning procedure This procedure automatically addresses the problem of ligand selection for core initial alignment construction as well as that of making use of the maximal context of data from many training ligands. Each of these has either been a centrally important QSAR benchmark (e.g. the steroid and 5-HT1a sets), a challenging independently curated benchmark (the Sutherland Set, consisting of the GABAA (aka “BZR”), COX2, AchE, and thrombin cases), one that allows for direct comparison to a physics-based approach (the FEP Set), or a data set that offers particular insight into the application of ligand-based binding site modeling to medicinal chemistry lead optimization (the muscarinic set). The numbering scheme for the muscarinic antagonists used here was taken from the original reports, with series A numbering from Johansson et al [42] and series B from Nordvall et al [43]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.