CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

Yanan Lu,Donghong Ji,Xiaohui Liang,Xiaoyuan Yao,Xiaomei Wei

doi:10.1186/1758-2946-7-s1-s4

Abstract

BackgroundThe chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary.MethodsWe developed a CHEMDNER system based on mixed conditional random fields (CRF) with word clustering for chemical compound and drug name recognition. For the word clustering, we used Brown's hierarchical algorithm and Skip-gram model based on deep learning with massive PubMed articles including titles and abstracts.ResultsThis system achieved the highest F-score of 88.20% for the CDI task and the second highest F-score of 87.11% for the CEM task in BioCreative IV. The performance was further improved by multi-scale clustering based on deep learning, achieving the F-score of 88.71% for CDI and 88.06% for CEM.ConclusionsThe mixed CRF model represents both the internal complexity and external contexts of the entities, and the model is integrated with word clustering to capture domain knowledge with PubMed articles including titles and abstracts. The domain knowledge helps to ensure the performance of the entity recognition, even without fine-grained linguistic features and manually designed rules.

Highlights

The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing
A high-performance named entity recognition system for chemical compound and drug names is necessary to ensure the performance of biomedical text processing tasks
The results indicate that the word clustering can improve the generalization ability of the conditional random fields (CRF) model

Summary

Introduction

The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. Chemical compound and drug name recognition was listed as a task in BioCreative IV [2], and it included two sub-tasks, i.e., indexing of documents with chemicals (chemical document indexing - CDI) and finding the mentions of chemicals in text (chemical entity mention recognition - CEM). This is a kind of named entity recognition (NER) tasks in natural. The chemical compound and drug names may contain a number of symbols mixed with common words, e.g., ‘(22E,24R)-6b-methoxyergosta-7,22-diene3b,5a-diol’ Another challenge is that the entity may consist of multiple phrases, e.g., ‘C35-fluoro, C35-difluoro, and C35-trifluorosolamins’, which is a coordinate structure. Such examples pose a great deal of difficulties in recognition

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jan 19, 2015
Citations: 59	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

CHEMDNER: The drugs and chemical names extraction challenge.
Martin Krallinger ... Alfonso Valencia
Journal of Cheminformatics | VOL. 7
Martin Krallinger, et. al.Martin Krallinger ... Alfonso Valencia
19 Jan 2015
Journal of Cheminformatics | VOL. 7

Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.
Hong-Jie Dai ... Po-Ting Lai
Journal of Cheminformatics | VOL. 7
Hong-Jie Dai, et. al.Hong-Jie Dai ... Po-Ting Lai
19 Jan 2015
Journal of Cheminformatics | VOL. 7

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
Wahed Hemati ... Alexander Mehler
Journal of Cheminformatics | VOL. 11
Wahed Hemati, et. al.Wahed Hemati ... Alexander Mehler
10 Jan 2019
Journal of Cheminformatics | VOL. 11

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.
Ling Luo ... Hongfei Lin
Bioinformatics | VOL. 34
Ling Luo, et. al.Ling Luo ... Hongfei Lin
24 Nov 2017
Bioinformatics | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics