Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

Mariusz Butkiewicz,C Weaver,Edward Lowe,Ralf Mueller,Pedro Teixeira,Jeffrey Mendenhall,Jens Meiler

doi:10.3390/molecules18010735

Abstract

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.

Highlights

The development of quantitative structure activity relationship (QSAR) models in ligand-based computer-aided drug discovery (LB-CADD) has shown practical value for in silico high-throughput screening (HTS) to identify potential hit compounds, i.e., compounds that share a biological activity of interest [1]
We focus on HTS experiments with a single well-defined biological target protein
Nine large data sets were assembled originating from realistic HTS experiments for a range of common drug target proteins including G-protein coupled receptor (GPCR), ion channels, transporters, kinase inhibitors, and enzymes

Summary

Introduction

The development of quantitative structure activity relationship (QSAR) models in ligand-based computer-aided drug discovery (LB-CADD) has shown practical value for in silico (virtual) high-throughput screening (HTS) to identify potential hit compounds, i.e., compounds that share a biological activity of interest [1]. LB-CADD has the potential to reduce these costs in a resource-limited academic environment. Public databases such as PubChem [4] contain biological activities for several hundred thousands of compounds tested against different biological targets [5]. LB-CADD is attractive in the resource-limited environment of academia as it reduces the cost and increases quality of drug discovery and/or probe development for rare or neglected diseases. Methods [13,14]

Molecular Descriptors Numerically Encode Chemical Structure

Consensus of QSAR Models Has Potential to Improve Prediction Accuracy

Significance

Results and Discussion

Machine Learning Algorithms Relate Chemical Structure to Biological Activity

Quality Measures Assess the Predictive Power of Machine Learning Algorithms

Experimental

GPCR: Allosteric Modulators of M1 Muscarinic Receptor

Ion Channel

Transporter

Kinase Inhibitor

Enzyme

Numerical Description of Molecules for QSAR Model Development

Monitoring Data Set is Used for Early Termination of Training Process

Cross-Validation Ascertains Robustness of QSAR Models

Selection of an Optimized Descriptor Set Guides QSAR Model Training

3.10. Consensus Predictions Seeks Improved Accuracies of Trained QSAR Models

N pIC50 or i 1 pEC50

3.11. Implementation

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecules	Publication Date: Jan 8, 2013
Citations: 138	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules

Lead the way for us

Similar Papers

GPU-accelerated machine learning techniques enable QSAR modeling of large HTS data
M Butkiewicz ... N Woetzel
-
M Butkiewicz, et. al.M Butkiewicz ... N Woetzel
01 May 2012
01 May 2012

QSAR and molecular docking studies of indole-based analogs as HIV-1 attachment inhibitors
Driss Cherqaoui ... Jane Bogdanov
Journal of Molecular Structure | VOL. 1193
Driss Cherqaoui, et. al.Driss Cherqaoui ... Jane Bogdanov
18 May 2019
Journal of Molecular Structure | VOL. 1193

Prediction of Log P of Halogenated Alkanes by Their ELUMO and Number of Chlorine and Carbon
Mika Sillanpää ...
Environmental Processes | VOL. 3
Mika Sillanpää, et. al.Mika Sillanpää ...
29 Jan 2016
Environmental Processes | VOL. 3

Performance Comparison between the Specific and Baseline Prediction Models of Ecotoxicity for Pharmaceuticals: Is a Specific QSAR Model Inevitable?
Chun Cai ... Yun Liu
Journal of Chemistry | VOL. 2021
Chun Cai, et. al.Chun Cai ... Yun Liu
31 Oct 2021
Journal of Chemistry | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules