Abstract

BackgroundThe virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model.ResultsWe evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening.ConclusionThe proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.

Highlights

  • The virtual screening of large compound databases is an important application of structural-activity relationship models

  • All kernels were capable of describing the Factor Xa inhibitors in a manner that allows the learning of a quantitative structure-activity relationship (QSAR) model with a good cross-validation performance

  • Thrombin and Platelet-derived growth factor receptor b (PDGFRb) seem to be less suited for learning the respective QSAR, but despite the low correlation coefficients, the prediction error still was small enough to apply the model in a virtual screening (VS) experiment

Read more

Summary

Introduction

The virtual screening of large compound databases is an important application of structural-activity relationship models. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model These cannot be extended to structured kernel-based machine learning models. Docking-based approaches [3,4,5,6,7,8] rank the compounds according to the score obtained by a docking of the compound into the binding pocket of the respective target protein These approaches use the information about the small molecule and the structure of the target to estimate the activity; this additional information comes at the expense of an increased prediction time and the need for a 3D structure of the protein. This approach gives good results in many cases [9,10,11,12], but depends strongly on the chosen query molecule and may be unable to discover ligands of a different chemotype than the query molecule [13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.