Mining basic active structures from a large-scale database

Naoto Takada,Norihito Ohmori,Takashi Okada

doi:10.1186/1758-2946-5-15

Naoto Takada, Norihito Ohmori + Show 1 more

Open Access

https://doi.org/10.1186/1758-2946-5-15

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: Mar 16, 2013
Citations: 10	License type: CC BY 2.0

Affiliation: Kwansei Gakuin University

Abstract

BackgroundThe Pubchem Database is a large-scale resource for chemical information, containing millions of chemical compound activities derived by high-throughput screening (HTS). The ability to extract characteristic substructures from such enormous amounts of data is steadily growing in importance. Compounds with shared basic active structures (BASs) exhibiting G-protein coupled receptor (GPCR) activity and repeated dose toxicity have been mined from small datasets. However, the mining process employed was not applicable to large datasets owing to a large imbalance between the numbers of active and inactive compounds. In most datasets, one active compound will appear for every 1000 inactive compounds. Most mining techniques work well only when these numbers are similar.ResultsThis difficulty was overcome by sampling an equal number of active and inactive compounds. The sampling process was repeated to maintain the structural diversity of the inactive compounds. An interactive KNIME workflow that enabled effective sampling and data cleaning processes was created. The application of the cascade model and subsequent structural refinement yielded the BAS candidates. Repeated sampling increased the ratio of active compounds containing these substructures. Three samplings were deemed adequate to identify all of the meaningful BASs. BASs expressing similar structures were grouped to give the final set of BASs. This method was applied to HIV integrase and protease inhibitor activities in the MDL Drug Data Report (MDDR) database and to procaspase-3 activators in the PubChem BioAssay database, yielding 14, 12, and 18 BASs, respectively.ConclusionsThe proposed mining scheme successfully extracted meaningful substructures from large datasets of chemical structures. The resulting BASs were deemed reasonable by an experienced medicinal chemist. The mining itself requires about 3 days to extract BASs with a given physiological activity. Thus, the method described herein is an effective way to analyze large HTS databases.

Highlights

The Pubchem Database is a large-scale resource for chemical information, containing millions of chemical compound activities derived by high-throughput screening (HTS)
Basic active structure (BAS) have already been extracted for G-protein coupled receptor (GPCR)-related activity and repeated dose toxicity, and the results have been disclosed on the BASiC website [2]
BASs of three pharmacological activities were successfully mined from large-scale databases, including real HTS data in the PubChem BioAssay database

Summary

Introduction

The Pubchem Database is a large-scale resource for chemical information, containing millions of chemical compound activities derived by high-throughput screening (HTS). Compounds with shared basic active structures (BASs) exhibiting G-protein coupled receptor (GPCR) activity and repeated dose toxicity have been mined from small datasets. The extraction of compounds with characteristic substructures and a certain physiological activity from large chemical databases is an important step in determining structure-activity relationships. BASs have already been extracted for G-protein coupled receptor (GPCR)-related activity and repeated dose toxicity, and the results have been disclosed on the BASiC website [2]. Rough set and activity landscape methods have provided useful suggestions as to the active substructure, but the number of molecules in the datasets was limited [6,7]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining basic active structures from a large-scale database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Activity-driven exploration of chemical space with morphing
Martin Sicho ... David Hoksza
-
Martin Sicho, et. al.Martin Sicho ... David Hoksza
01 Nov 2015
01 Nov 2015

Classification of High‐Activity Tiagabine Analogs by Binary QSAR Modeling
Andreas Jurik ... Barbara Zdrazil
QSAR & Combinatorial Science | VOL. 32
Andreas Jurik, et. al.Andreas Jurik ... Barbara Zdrazil
15 May 2013
QSAR & Combinatorial Science | VOL. 32

Activity cliffs in PubChem confirmatory bioassays taking inactive compounds into account
Ye Hu ... Gerald M Maggiora
Journal of Computer-Aided Molecular Design | VOL. 27
Ye Hu, et. al.Ye Hu ... Gerald M Maggiora
08 Jan 2013
Journal of Computer-Aided Molecular Design | VOL. 27

High-Throughput Screening Assay Datasets from the PubChem Database
Mariusz Butkiewicz ... Yanli Wang
Chemical Informatics | VOL. 03
Mariusz Butkiewicz, et. al.Mariusz Butkiewicz ... Yanli Wang
01 Jan 2017
Chemical Informatics | VOL. 03

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining basic active structures from a large-scale database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics