Abstract

BackgroundDeveloping structure–activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.ResultsResearch discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named “PubChem SAR clusters”, were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.ConclusionsThe PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.Graphical abstractElectronic supplementary materialThe online version of this article (doi:10.1186/s13321-015-0070-x) contains supplementary material, which is available to authorized users.

Highlights

  • Developing structure–activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery

  • The present study describes our preliminary work to build a new database resource from the PubChem3D project, namely, PubChem structure–activity relationship (SAR) clusters [29]

  • 3. for Set C, compounds were declared to be noninactive against at least one target protein sequence involved in a biological pathway or biosystem that was stored in the NCBI’s BioSystems database [35]

Read more

Summary

Introduction

Developing structure–activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. PubChem [1–6] is a public repository for information on small molecules and their biological activities (hereafter called “bioactivities”) It has a wealthy collection of chemical information, with more than 180 million depositor-provided substance descriptions, 60 million unique chemical structures, and one million biological assay results (as of December 2014). Traditional 2-D similarity methods sometimes fail to recognize structural similarity that can be realized with threedimensional (3-D) similarity methods [13–16] To address this issue, the PubChem3D project was launched [17–24]. It delivers tools and services that exploit 3-D molecular similarity between these conformer models, which is quantified using the atom-centered Gaussian-shape comparison method by Grant and Pickup [25–28] (see the “Methods” section for more details on PubChem’s 3-D similarity method). The present study describes our preliminary work to build a new database resource from the PubChem3D project, namely, PubChem structure–activity relationship (SAR) clusters [29]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.