Abstract

Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities.Database URL: http://ctdbase.org/

Highlights

  • Manual curation of the scientific literature is a specialized endeavor that transforms authors’ free-text information into annotated knowledge, via the use of controlled vocabularies and ontologies, by professional biocurators [1,2]

  • As a means of gauging the type of information being captured, we evaluated the top 20 genes using CTD’s Set Analyzer tool to find their associated GO biological processes (GO-BP) (Figure 2B, inset)

  • Text mining and manual curation of the scientific literature is a way to discover and unlock vast amounts of data originally stored as free-text by authors

Read more

Summary

Introduction

Manual curation of the scientific literature is a specialized endeavor that transforms authors’ free-text information into annotated knowledge, via the use of controlled vocabularies and ontologies, by professional biocurators [1,2]. This process helps standardize, harmonize and organize disparate data from scientific publications into a structured format, making it more manageable and computable for analysis. SIDER mines drug labels to create a database of drugs, side effects and side effect frequency [8] Neither of these last two sources takes advantage of the scientific literature, in which drug-induced phenomena are documented in a variety of settings, such as in vitro and in vivo methods, across species, for approved indications, off-label uses and for drugs in development

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call