Analyzing compound activity records and promiscuity degrees in light of publication statistics.

Ye Hu,Jürgen Bajorath

doi:10.12688/f1000research.8792.1

Abstract

For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.

Highlights

Given the large volumes of compounds and activity data that are becoming available in the public domain[1], mining of activity data can be expected to provide fresh insights into structureactivity relationships, compound distributions over current targets, or compound activity profiles
One can distinguish between “good” and “bad” promiscuity; the latter resulting from assay artifacts due to, for example, undesired compound pan-assay interference[3,4] or aggregator[5] characteristics; the former from the ability of small molecules to interact with multiple targets[2]
A total of 318,570 potency measurements were available and associated with 257,138 unique activity records, which were defined as individual compound-target entries containing all associated publications and qualifying potency measurements

Summary

Introduction

Given the large volumes of compounds and activity data that are becoming available in the public domain[1], mining of activity data can be expected to provide fresh insights into structureactivity relationships, compound distributions over current targets, or compound activity profiles. Most recent analyses of compound promiscuity on the basis of high-confidence activity data from medicinal chemistry have revealed that compounds covering the current spectrum of thousands of targets are on average active against one or two targets[10]. This low degree of detectable promiscuity was found to be essentially stable over time, especially during periods of exponential compound data growth over the past decade[11]. These findings give rise to speculations concerning possible reasons for the higher degree of drug promiscuity[13]

Methods

Results

Conclusion