Abstract

Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- and inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promising F-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively.Database URL: https://github.com/JHnlp/BC5CIDTask

Highlights

  • With the rapid accumulation of the scientific literature, there is an increasing interest in extracting semantic relations between chemicals and diseases described in text repositories, as they play an important role in many areas in healthcare and biomedical research [1,2,3].Identification of chemical–disease relations (CDRs), such as mechanistic and biomarker/correlative relations from the literature, can be helpful in developing chemicals for therapeutics and improving studies on chemical safety and toxicity

  • We report our approach to the chemical-induced disease (CID) relation extraction subtask of the BioCreative-V CDR task

  • We first present a brief introduction to the CDR corpus, and we systematically evaluate the performance of our approach on the corpus

Read more

Summary

Introduction

With the rapid accumulation of the scientific literature, there is an increasing interest in extracting semantic relations between chemicals and diseases described in text repositories, as they play an important role in many areas in healthcare and biomedical research [1,2,3].Identification of chemical–disease relations (CDRs), such as mechanistic and biomarker/correlative relations from the literature, can be helpful in developing chemicals for therapeutics and improving studies on chemical safety and toxicity. The primary step for automatic CDR extraction is DNER, which was found to be difficult in previous BioCreative tasks [14, 15]. For this subtask, participants were given the abstracts of raw PubMed articles and asked to return normalized concept identifiers for disease entities. Participants were given the abstracts of raw PubMed articles and asked to return normalized concept identifiers for disease entities In this subtask, both chemicals and diseases were described using the Medical Subject Headings (MeSH)-controlled vocabulary

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call