Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

Meir Glick,James H Nettles,John W Davies,Jeremy L Jenkins,Hamilton Hitchings

doi:10.1021/ci050374h

Abstract

High-throughput screening (HTS) plays a pivotal role in lead discovery for the pharmaceutical industry. In tandem, cheminformatics approaches are employed to increase the probability of the identification of novel biologically active compounds by mining the HTS data. HTS data is notoriously noisy, and therefore, the selection of the optimal data mining method is important for the success of such an analysis. Here, we describe a retrospective analysis of four HTS data sets using three mining approaches: Laplacian-modified naive Bayes, recursive partitioning, and support vector machine (SVM) classifiers with increasing stochastic noise in the form of false positives and false negatives. All three of the data mining methods at hand tolerated increasing levels of false positives even when the ratio of misclassified compounds to true active compounds was 5:1 in the training set. False negatives in the ratio of 1:1 were tolerated as well. SVM outperformed the other two methods in capturing active compounds and scaffolds in the top 1%. A Murcko scaffold analysis could explain the differences in enrichments among the four data sets. This study demonstrates that data mining methods can add a true value to the screen even when the data is contaminated with a high level of stochastic noise.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Journal: Journal of Chemical Information and Modeling	Publication Date: Dec 3, 2005
Citations: 115

Similar Papers

Identifying Actives from HTS Data Sets: Practical Approaches for the Selection of an Appropriate HTS Data-Processing Method and Quality Control Review
Tong Ying Shun ... Paul A Johnston
SLAS Discovery | VOL. 16
Tong Ying Shun, et. al.Tong Ying Shun ... Paul A Johnston
01 Jan 2010
SLAS Discovery | VOL. 16

GPU-accelerated machine learning techniques enable QSAR modeling of large HTS data
E W Lowe ... M Butkiewicz
-
E W Lowe, et. al.E W Lowe ... M Butkiewicz
01 May 2012
01 May 2012

Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning.
A Michiel Van Rhee ... Jon Stocker
Journal of combinatorial chemistry | VOL. 3
A Michiel Van Rhee, et. al.A Michiel Van Rhee ... Jon Stocker
28 Feb 2001
Journal of combinatorial chemistry | VOL. 3

Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO)
Uma D Vempati ... Magdalena J Przydzial
PLoS ONE | VOL. 7
Uma D Vempati, et. al.Uma D Vempati ... Magdalena J Przydzial
14 Nov 2012
PLoS ONE | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling