Abstract

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD’s gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects.

Highlights

  • Manual curation of the scientific literature helps standardize, harmonize, and organize disparate data into a structured format, making it more manageable and computable for analysis [1,2]

  • We have formatted all of these computed Gene Ontology (GO)-disease inferences into structured files that are freely available from the “Data Downloads” page for the three branches: GO-CC, gene product’s molecular function (GO-MF), and GO-BP (Fig 1C)

  • GO-BP has the greatest number of associated inferences since, on average, genes tend to be annotated with more GO-BP than GO-MF or GO-CC terms

Read more

Summary

Introduction

Manual curation of the scientific literature helps standardize, harmonize, and organize disparate data into a structured format, making it more manageable and computable for analysis [1,2]. In 2013, CTD collaborated with Pfizer scientists to manually curate 88,000 articles for interactions between 1,500 therapeutic drugs and their diseases [6] This collaboration enhanced the scope of CTD information beyond environmental chemicals, and highlighted the goal of understanding chemical toxicity for both environmental health scientists and pharmaceutical drug developers. Integrating CTD’s three core data types (chemical-gene, chemical-disease, and gene-disease) yields chemical-gene-disease inferences that can be statistically evaluated and ranked [10]. This method of knowledge transfer can be used for any type of data, including Gene Ontology (GO) annotations

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.