Abstract

Mining chemical-induced disease relations embedded in the vast biomedical literature could facilitate a wide range of computational biomedical applications, such as pharmacovigilance. The BioCreative V organized a Chemical Disease Relation (CDR) Track regarding chemical-induced disease relation extraction from biomedical literature in 2015. We participated in all subtasks of this challenge. In this article, we present our participation system Chemical Disease Relation Extraction SysTem (CD-REST), an end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug–disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request. The web services can be accessed from http://clinicalnlptool.com/cdr. The online CD-REST demonstration system is available at http://clinicalnlptool.com/cdr/cdr.html.Database URL: http://clinicalnlptool.com/cdr; http://clinicalnlptool.com/cdr/cdr.html

Highlights

  • Over the past decades, extensive biomedical studies have been conducted to assess the relations between chemicals and diseases, which resulted in a huge volume of literature regarding complex chemical–disease relations

  • We present the Chemical Disease Relation Extraction SysTem (CD-REST) built for the BioCreative V CDR Track

  • We explored four different knowledge bases: MeSH, CTD, MEDication Indication Resource (MEDI) [27] and Side Effect Resource (SIDER) [28]

Read more

Summary

Introduction

Extensive biomedical studies have been conducted to assess the relations between chemicals and diseases, which resulted in a huge volume of literature regarding complex chemical–disease relations (e.g. treatment or adverse events). On building comprehensive databases containing relations between chemicals and diseases from literature. The Comparative Toxicogenomics Database (CTD) [1] contains chemical–disease associations that are manually extracted from the biomedical literature by biocurators. Natural language processing (NLP) methods that could automatically detect chemical and disease concepts, as well as their relations from biomedical literature have shown great potential in terms of facilitating biomedical curation processes [2,3,4]. Automated extraction of chemical and disease relations from literature requires two steps: 1) named entity recognition (NER), to identify chemical and disease entities from narrative text; and 2) relation extraction, to determine the relations between any pair of chemical and disease entities in one document

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call