Abstract
Medicinal chemists today find themselves in an increasingly information-rich environment. An abundance of compound activity and affinity data is being published, and medicinal chemistry data are increasingly connected with a broader world of data from the realms of bioinformatics and systems biology. In recent years, a number of publicly accessible, chemistry-oriented databases of interest to medicinal chemists have been established to facilitate access to medicinal chemistry data and their biological links, with the aim of accelerating the discovery of new medications. In order to maximize their usefulness, it is important that researchers in pertinent fields be fully aware of these resources and exploit their full potential. Decades of growth worldwide in the pharmaceutical industry and of academic drug discovery efforts, along with technological advances that speed compound synthesis and assays1 , and the advent and growth of the related fields of chemical biology and chemical genomics, have led to an ongoing flood of publications with valuable data regarding new compounds and their biological activities. On the order of 20,000 - 30,000 new compounds are now published per year in some of the main medicinal chemistry journals, and this rate has accelerated in recent years (as detailed below). However, publication in conventional journals traps data in a form where they are inaccessible to computer search and retrieval. For example, it is not possible to search standard scientific articles for compounds of interest or to reliably extract machine-readable representations of compounds from chemical drawings in articles. As a consequence, the conventional publishing paradigm can severely restrict the discoverability and usability of medicinal chemistry data. The parallel growth of information technology and the emergence of the World Wide Web in the 1990’s have created important new opportunities for dissemination of data. Biologists – especially structural and molecular biologists – seized these opportunities, establishing central data resources like the Protein Data Bank2 and GenBank3 and laying the foundations for the field of bioinformatics. The first public protein-ligand database aimed at serving the drug discovery community, BindingDB, came on line in late 2000. This resource has grown substantially and has since been joined by other important databases with related scopes and goals. According to Pathguide, a web resource for online databases, there are at least 43 protein-compound interaction databases4, 5 and many other useful, yet free, chemical databases are now available6. Such resources are of increasing value not only for basic uses like finding and downloading structure-activity relationship (SAR) data for a protein target of interest, but also for emergent applications that become possible as the medicinal chemistry dataset grows to provide a comprehensive picture of small molecules in the larger biological context. For example, if a cell-based screen reveals that a new compound inhibits apoptosis, then one might seek similar compounds that bind apoptosis-related proteins, and thus hypothesize that the new compound also binds one of these targets. Similarly, if one is prioritizing several lead compounds for further development, the observation that one lead is similar to a published compound known to bind a different target might lead one to reduce its priority, to minimize off-target effects. In another scenario, marking all the proteins in a defined signaling pathway according to which ones already are targeted by FDA-approved drugs might lead to suggestions for a multidrug therapy to maximally suppress signaling. Here, we aim first to help medicinal chemists take advantage of the growing array of freely accessible medicinal chemistry-oriented databases by discussing three central resources focused on small molecule binding and bioactivity, BindingDB, ChEMBL and PubChem, and noting as well several other small molecule databases that are also of great value. (Readers interested in additional perspectives will enjoy other recent reviews7-12). In particular, Section B seeks to help users over the initial barriers encountered when one starts to use these rather complex resources, by summarizing information their organization and methods of accessing key types of data, information that is not always easy to glean from their respective web-sites. Subsequent sections then offer broader discussions of the field, and some readers may wish to jump directly to Section C, which uses the available medicinal chemistry data to derive interesting overviews of the available medicinal chemistry data; or to Section D, which offers views towards the future of online compound databases and their applications, including the possibility of integrating related databases to minimize overlapping efforts, addressing the challenge of getting data into databases where they can be most useful, and the role of medicinal chemistry databases in systems biology and systems pharmacology.
Accepted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have