Abstract
For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.
Highlights
Given the large volumes of compounds and activity data that are becoming available in the public domain[1], mining of activity data can be expected to provide fresh insights into structureactivity relationships, compound distributions over current targets, or compound activity profiles
One can distinguish between “good” and “bad” promiscuity; the latter resulting from assay artifacts due to, for example, undesired compound pan-assay interference[3,4] or aggregator[5] characteristics; the former from the ability of small molecules to interact with multiple targets[2]
A total of 318,570 potency measurements were available and associated with 257,138 unique activity records, which were defined as individual compound-target entries containing all associated publications and qualifying potency measurements
Summary
Given the large volumes of compounds and activity data that are becoming available in the public domain[1], mining of activity data can be expected to provide fresh insights into structureactivity relationships, compound distributions over current targets, or compound activity profiles. Target annotations of bioactive compounds can be systematically extracted and their current degree of promiscuity be determined[2] In this context, one can distinguish between “good” and “bad” promiscuity; the latter resulting from assay artifacts due to, for example, undesired compound pan-assay interference[3,4] or aggregator[5] characteristics; the former from the ability of small molecules to interact with multiple targets[2]. Most recent analyses of compound promiscuity on the basis of high-confidence activity data from medicinal chemistry have revealed that compounds covering the current spectrum of thousands of targets are on average active against one or two targets[10]. These findings give rise to speculations concerning possible reasons for the higher degree of drug promiscuity[13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.