PubChem3D: Biologically relevant 3-D similarity

Sunghwan Kim,Evan E Bolton,Stephen H Bryant

doi:10.1186/1758-2946-3-26

Sunghwan Kim, Evan E Bolton + Show 1 more

Open Access

https://doi.org/10.1186/1758-2946-3-26

Copy DOI

Abstract

BackgroundThe use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between "active/active" and "active/inactive" spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools.ResultsThe similarity value distributions of 269.7 billion unique conformer pairs from 734,486 biologically tested compounds (all-against-all) from PubChem were utilized to help work towards an answer to the question: what is a biologically meaningful 3-D similarity score? The average and standard deviation for the six similarity measures STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt were 0.54 ± 0.10, 0.07 ± 0.05, 0.62 ± 0.13, 0.41 ± 0.11, 0.18 ± 0.06, and 0.59 ± 0.14, respectively. Considering that this random distribution of biologically tested compounds was constructed using a single theoretical conformer per compound (the "default" conformer provided by PubChem), further study may be necessary using multiple diverse conformers per compound; however, given the breadth of the compound set, the single conformer per compound results may still apply to the case of multi-conformer per compound 3-D similarity value distributions. As such, this work is a critical step, covering a very wide corpus of chemical structures and biological assays, creating a statistical framework to build upon.The second part of this study explored the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay "inactives" and "actives". Using the terminology of noninactive-noninactive (NN) pairs and the noninactive-inactive (NI) pairs to represent comparison of the "active/active" and "active/inactive" spaces, respectively, each of the 1,389 biological assays was examined by their 3-D similarity score differences between the NN and NI pairs and analyzed across all assays and by assay category types. While a consistent trend of separation was observed, this result was not statistically unambiguous after considering the respective standard deviations. While not all "actives" in a biological assay are amenable to this type of analysis, e.g., due to different mechanisms of action or binding configurations, the ambiguous separation may also be due to employing a single conformer per compound in this study. With that said, there were a subset of biological assays where a clear separation between the NN and NI pairs found. In addition, use of combo Tanimoto (ComboT) alone, independent of superposition optimization type, appears to be the most efficient 3-D score type in identifying these cases.ConclusionThis study provides a statistical guideline for analyzing biological assay data in terms of 3-D similarity and PubChem structure-activity analysis tools. When using a single conformer per compound, a relatively small number of assays appear to be able to separate "active/active" space from "active/inactive" space.

Highlights

The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between “active/active” and “active/inactive” spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools
While the PubChem Substance database contains information provided by individual depositors, the PubChem Compound database contains the unique standardized chemical structure contents extracted from the PubChem Substance database
Notations In the present study, we consider six different similarity measures: ST, CT, and combo Tanimoto (ComboT) for two different optimization types. They are denoted with a superscript, which represents the optimization type, and a subscript, which specifies the type of CID pairs ("NN” for the NN pairs and “NI” for the NI pairs)

Summary

Introduction

The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between “active/active” and “active/inactive” spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools. Recent advances in combinatorial chemistry [1,2,3,4,5,6] and high-throughput screening technology [7,8,9,10,11,12,13,14,15,16,17] have made the synthesis and screening of diverse chemical compounds easier, helping to create a demand in the biomedical research community for archives of publicly available screening data. PubChem provides various analysis tools to relate chemical structures to the biological activity data stored in the PubChem BioAssay database (unique identifier AID)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jul 22, 2011
Citations: 51	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

PubChem3D: Biologically relevant 3-D similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis
Sunghwan Kim ... Stephen H Bryant
Journal of Cheminformatics | VOL. 4
Sunghwan Kim, et. al.Sunghwan Kim ... Stephen H Bryant
07 Nov 2012
Journal of Cheminformatics | VOL. 4

Application of Systems Engineering Principles and Techniques in Biological Big Data Analytics: A Review
Q Peter He ... Jin Wang
Processes | VOL. 8
Q Peter He, et. al.Q Peter He ... Jin Wang
07 Aug 2020
Processes | VOL. 8

ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities.
Marina Garcia De Lomana ... Roland Buesen
Journal of Chemical Information and Modeling | VOL. 61
Marina Garcia De Lomana, et. al.Marina Garcia De Lomana ... Roland Buesen
21 Jun 2021
Journal of Chemical Information and Modeling | VOL. 61

A novel high-throughput and label-free phenotypic drug screening approach:MALDI-TOF mass spectrometry combined with machine learning strategies
...
-
, et. al. ...
21 Sep 2020
21 Sep 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PubChem3D: Biologically relevant 3-D similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics