A knowledge-poor approach to chemical-disease relation extraction.

Firoj Alam,Roberto Zanoli,Anna Corazza,Alberto Lavelli

doi:10.1093/database/baw071

Abstract

The article describes a knowledge-poor approach to the task of extracting Chemical-Disease Relations from PubMed abstracts. A first version of the approach was applied during the participation in the BioCreative V track 3, both in Disease Named Entity Recognition and Normalization (DNER) and in Chemical-induced diseases (CID) relation extraction. For both tasks, we have adopted a general-purpose approach based on machine learning techniques integrated with a limited number of domain-specific knowledge resources and using freely available tools for preprocessing data. Crucially, the system only uses the data sets provided by the organizers. The aim is to design an easily portable approach with a limited need of domain-specific knowledge resources. In the participation in the BioCreative V task, we ranked 5 out of 16 in DNER, and 7 out of 18 in CID. In this article, we present our follow-up study in particular on CID by performing further experiments, extending our approach and improving the performance.

Highlights

Manual curation of chemical-disease relations (CDRs) from the literature is expensive and it is difficult to keep up with the growing amount of relevant literature
In the spirit of better matching the actual requirements of practical applications, we decided to approach the tasks in the CDRs track at BioCreative-V, which are different in a few respects from the usual named entity recognition (NER) and relation extraction (RE) tasks
In the first version of the system that participated in the BioCreative V CDR task [16], we only considered a Document Level Classifier (DLC), which takes such a Feature Vector (FV) built from the abstract for every pair of chemical and disease entities

Summary

Introduction

Manual curation of chemical-disease relations (CDRs) from the literature is expensive and it is difficult to keep up with the growing amount of relevant literature. Comparative toxicogenomics database As a domain-specific resource we have exploited the CTD [20], a publicly available database that aims to advance understanding about how environmental exposures to chemicals affect human health It provides manually curated information about chemicals, and diseases that, in our approach, are used to capture the different ways the entities are mentioned in texts. It is worth mentioning another problem that often comes up with Named Entity Recognition in biomedical texts, and that requires to identify and resolve composite named entities, where a single span refers to more than one concept (e.g. neurological and cardiovascular toxicity) In this regard, only 1% of disease and chemical mentions are composite mentions in the provided data set, and so we do not use any specific resource (e.g. SimConcept tool) to deal with such cases. We consider four binary relation features, depending on both entities, defined as follows: 1. Is the entity pair listed as a positive chemical-disease relation in the CTD [20]?

Do the mentions of both entities appear in the same sentence in the abstract?

Experiments

Findings

Conclusions and future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database : the journal of biological databases and curation	Publication Date: Jan 1, 2016
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

A knowledge-poor approach to chemical-disease relation extraction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation

Lead the way for us

Similar Papers

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.
Chih-Hsuan Wei ... Jiao Li
Database | VOL. 2016
Chih-Hsuan Wei, et. al.Chih-Hsuan Wei ... Jiao Li
01 Jan 2015
Database | VOL. 2016

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.
Hoang-Quynh Le ... Thanh Hai Dang
Database : the journal of biological databases and curation | VOL. 2016
Hoang-Quynh Le, et. al.Hoang-Quynh Le ... Thanh Hai Dang
01 Jul 2016
Database : the journal of biological databases and curation | VOL. 2016

Erratum: Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.
Hoang-Quynh Le ... Thanh Hai Dang
Database : the journal of biological databases and curation | VOL. 2016
Hoang-Quynh Le, et. al.Hoang-Quynh Le ... Thanh Hai Dang
01 Jan 2015
Database : the journal of biological databases and curation | VOL. 2016

A transition-based joint model for disease named entity recognition and normalization.
Yinxia Lou ... Donghong Ji
Bioinformatics | VOL. 33
Yinxia Lou, et. al.Yinxia Lou ... Donghong Ji
24 Mar 2017
Bioinformatics | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A knowledge-poor approach to chemical-disease relation extraction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation