CHEMDNER: The drugs and chemical names extraction challenge.

Martin Krallinger,Florian Leitner,Miguel Vazquez,Obdulia Rabal,Julen Oyarzabal,Alfonso Valencia

doi:10.1186/1758-2946-7-s1-s1

Abstract

Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties.

Highlights

Unstructured data repositories contain fundamental descriptions of chemical entities, such as their targets and binding partners, metabolism, enzymatic reactions, potential adverse effects and therapeutic use, just to name a few
For the chemical document indexing (CDI) task, this strategy obtained a micro-averaged F-score of 53.85%, while in case of the chemical entity mention recognition (CEM) task it reached a micro averaged F-score of 57.11
The CHEMDNER task of BioCreative IV showed that the automatic recognition of chemical entities from PubMed abstracts is a feasible task by automated named entity recognition systems

Summary

Introduction

Unstructured data repositories contain fundamental descriptions of chemical entities, such as their targets and binding partners, metabolism, enzymatic reactions, potential adverse effects and therapeutic use, just to name a few. Text-mining methods have shown promising results in the biomedical domain, where a considerable amount of methods and applications have been published [2,3]. Knowing which compounds are described in a given paper, and where exactly those descriptions are, is key to select appropriate papers. With such fine-grained annotations, it is possible to directly point to relevant sentences and to extract more detailed chemical entity relations. The recognition of gene and protein mentions was addressed in several community challenges (BioCreative I, II, JNLPBA) that served to determine the state of the art methodology and systems performance [5,12] in addition of providing valuable datasets for developing new systems [15]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jan 19, 2015
Citations: 233	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CHEMDNER: The drugs and chemical names extraction challenge.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.
Yanan Lu ... Xiaomei Wei
Journal of Cheminformatics | VOL. 7
Yanan Lu, et. al.Yanan Lu ... Xiaomei Wei
19 Jan 2015
Journal of Cheminformatics | VOL. 7

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
Wahed Hemati ... Alexander Mehler
Journal of Cheminformatics | VOL. 11
Wahed Hemati, et. al.Wahed Hemati ... Alexander Mehler
10 Jan 2019
Journal of Cheminformatics | VOL. 11

Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.
Hong-Jie Dai ... Po-Ting Lai
Journal of Cheminformatics | VOL. 7
Hong-Jie Dai, et. al.Hong-Jie Dai ... Po-Ting Lai
19 Jan 2015
Journal of Cheminformatics | VOL. 7

An end-to-end hybrid algorithm for automated medication discrepancy detection.
Qi Li ... Nataline Lingren
BMC Medical Informatics and Decision Making | VOL. 15
Qi Li, et. al.Qi Li ... Nataline Lingren
06 May 2015
BMC Medical Informatics and Decision Making | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CHEMDNER: The drugs and chemical names extraction challenge.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics