Abstract

This study presents an analysis of the small molecule bioactivity profiles across large quantities of diverse protein families represented in PubChem BioAssay. We compared the bioactivity profiles of FDA approved drugs to non-FDA approved compounds, and report several distinct patterns characteristic of the approved drugs. We found that a large fraction of the previously reported higher target promiscuity among FDA approved compounds, compared to non-FDA approved bioactives, was frequently due to cross-reactivity within rather than across protein families. We identified 804 potentially novel protein target candidates for FDA approved drugs, as well as 901 potentially novel target candidates with active non-FDA approved compounds, but no FDA approved drugs with activity against these targets. We also identified 486348 potentially novel compounds active against the same targets as FDA approved drugs, as well as 153402 potentially novel compounds active against targets without active FDA approved drugs. By quantifying the agreement among replicated screens, we estimated that more than half of these novel outcomes are reproducible. Using biclustering, we identified many dense clusters of FDA approved drugs with enriched activity against a common set of protein targets. We also report the distribution of compound promiscuity using a Bayesian statistical model, and report the sensitivity and specificity of two common methods for identifying promiscuous compounds. Aggregator assays exhibited greater accuracy in identifying highly promiscuous compounds, while PAINS substructures were able to identify a much larger set of “middle range” promiscuous compounds. Additionally, we report a large number of promiscuous compounds not identified as aggregators or PAINS. In summary, the results of this study represent a rich reference for selecting novel drug and target protein candidates, as well as for eliminating candidate compounds with unselective activities.

Highlights

  • High throughput screening (HTS) is a key technology for identifying bioactive small molecules for chemical genomics and drug discovery applications

  • We used the RDKit software library SMARTS based Pan-assay interference compounds (PAINS) filters to identify compounds classified by the PAINS filters A, B, or C. These SMARTS filters are based on the SMARTS conversion published by Saubern et al based on the SLN format filters originally published by Baell et al [28, 49] This identified 19988 PAINS compounds, and 298166 nonPAINS compounds, among the set of highly screened actives in PubChem BioAssay. 68 of the compounds we identified as PAINS are FDA approved drugs

  • By systematically analyzing a large volume of public bioactivity data, we highlight several new patterns of bioactivity that may prove useful for informing drug discovery efforts

Read more

Summary

Introduction

High throughput screening (HTS) is a key technology for identifying bioactive small molecules for chemical genomics and drug discovery applications. At the time of writing, the PubChem BioAssay database contains just over 230 million small molecule bioactivity outcomes, over half of which involve activity against a clearly defined protein target [3] It includes most of the bioactivity data available in the public domain as it imports assays from many sources such as ChEMBL, and provides negative (inactive) assay outcomes not reported in many databases [4]. To investigate why FDA approved drugs on average exhibit activity against a greater number of targets than non-FDA compounds, we computed the target selectivity of small molecules against protein clusters obtained with three distinct methods that classify protein sequences across increasingly large evolutionary distances. To investigate the frequency of highly promiscuous compounds, we used a statistical model to infer the hit ratio of each compound, and report 1157 likely-promiscuous compounds not previously identified by two common methods of identifying promiscuous compounds, aggregator assays and PAINS substructures [12, 28]

Results and discussion
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.