The effect of structural redundancy in validation sets on virtual screening performance

Robert D Clark,Jennifer K Shepphird,John Holliday

doi:10.1002/cem.1240

Abstract

AbstractThe performance of a classification model is often assessed in terms of how well it separates a set of known observations into appropriate classes. If the validation sets used for such analyses are redundant due to bias in sampling, the relevance of the conclusions drawn to prospective work in which new kinds of positives are sought may be compromised. In the case of the various virtual screening techniques used in modern drug discovery, such bias generally appears as over‐representation of particular structural subclasses in the test set. We show how clustering by substructural similarity, followed by applying arithmetic and harmonic weighting schemes to receiver operating characteristic (ROC) curves, can be used to identify validation sets that are biased due to such redundancies. This can be accomplished qualitatively by direct examination or quantitatively by comparing the areas under the respective linear or semilog curves (AUCs or pAUCs). Copyright © 2009 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The effect of structural redundancy in validation sets on virtual screening performance

Abstract

Talk to us

Similar Papers

More From: Journal of Chemometrics

Lead the way for us

Journal: Journal of Chemometrics	Publication Date: Apr 27, 2009
Citations: 8

Similar Papers

Identifying and characterizing promiscuous targets: Implications for virtual screening
Violeta I Pérez-Nueno ... David W Ritchie
Expert Opinion on Drug Discovery | VOL. 7
Violeta I Pérez-Nueno, et. al.Violeta I Pérez-Nueno ... David W Ritchie
08 Nov 2011
Expert Opinion on Drug Discovery | VOL. 7

Pharmacophore alignment search tool: Influence of the third dimension on text‐based similarity searching
Volker Hähnke ... Alexander Klenner
Journal of Computational Chemistry | VOL. 32
Volker Hähnke, et. al.Volker Hähnke ... Alexander Klenner
15 Feb 2011
Journal of Computational Chemistry | VOL. 32

Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge.
Adam E Flanders ...
Radiology. Artificial intelligence | VOL. 2
Adam E Flanders, et. al.Adam E Flanders ...
29 Apr 2020
Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge.
Adam E Flanders ...

Evaluation of QSAR Equations for Virtual Screening.
Jacob Spiegel ... Hanoch Senderowitz
International Journal of Molecular Sciences | VOL. 21
Jacob Spiegel, et. al.Jacob Spiegel ... Hanoch Senderowitz
22 Oct 2020
International Journal of Molecular Sciences | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The effect of structural redundancy in validation sets on virtual screening performance

Abstract

Talk to us

Similar Papers

More From: Journal of Chemometrics