Analysis of in vitrobioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds

Christopher Southan,Sorel Muresan,Kiran Boppana,Sarma Arp Jagarlapudi

doi:10.1186/1758-2946-3-14

Christopher Southan, Sorel Muresan + Show 2 more

Open Access

https://doi.org/10.1186/1758-2946-3-14

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: May 13, 2011
Citations: 66	License type: CC BY 2.0

Affiliation: AstraZeneca (Sweden)

Abstract

BackgroundSince the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.ResultsFrom a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical in vitro binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.ConclusionsThe compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.

Highlights

An important factor in assessing the global progress in drug research is the number of targets for which therapeutic small-molecule modulators have been, are being, or could be, generated
The compounds-per-protein listing generated in this work represents the major proportion of the human drug target landscape defined by published data
We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology

Summary

Results

From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical in vitro binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a pertarget basis

Conclusions

Introduction

Results and Discussion

17. Devidas S

22. Mackie K