Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

Shengyu Liu,Buzhou Tang,Qingcai Chen,Xiaolong Wang

doi:10.3390/info6040848

Abstract

Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to existing drug dictionaries immediately after they are developed is also a challenge. In recent years, word embeddings that contain rich latent semantic information of words have been widely used to improve the performance of various natural language processing tasks. However, they have not been used in DNR systems. Compared to the semantic features based on drug dictionaries, the advantage of word embeddings lies in that learning them is unsupervised. In this paper, we investigate the effect of semantic features based on word embeddings on DNR and compare them with semantic features based on three drug dictionaries. We propose a conditional random fields (CRF)-based system for DNR. The skip-gram model, an unsupervised algorithm, is used to induce word embeddings on about 17.3 GigaByte (GB) unlabeled biomedical texts collected from MEDLINE (National Library of Medicine, Bethesda, MD, USA). The system is evaluated on the drug-drug interaction extraction (DDIExtraction) 2013 corpus. Experimental results show that word embeddings significantly improve the performance of the DNR system and they are competitive with semantic features based on drug dictionaries. F-score is improved by 2.92 percentage points when word embeddings are added into the baseline system. It is comparative with the improvements from semantic features based on drug dictionaries. Furthermore, word embeddings are complementary to the semantic features based on drug dictionaries. When both word embeddings and semantic features based on drug dictionaries are added, the system achieves the best performance with an F-score of 78.37%, which outperforms the best system of the DDIExtraction 2013 challenge by 6.87 percentage points.

Highlights

Drug name recognition (DNR) is a critical step for drug information extraction such as drug interactions [1]
We first compare the performances of the conditional random fields (CRF)-based DNR systems when different semantic features, including semantic features based on each one of the three drug dictionaries and word embeddings, are added into the baseline system
It can be seen that both the semantic features based on drug dictionaries and the semantic features based on word embeddings are beneficial to DNR

Summary

Introduction

Drug name recognition (DNR) is a critical step for drug information extraction such as drug interactions [1]. It contains two tasks: detecting the boundaries of drug names in unstructured texts (drug detection) and classifying the detected drug names into some predefined categories (drug classification). It is a challenging task for various reasons. Machine learning-based methods are superior to the other two categories of methods because of their good performances and robustness when a large labeled corpus is available [2,5]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Dec 11, 2015
Citations: 100	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Exploring Word Embedding for Drug Name Recognition
Isabel Segura-Bedmar ... Víctor Suárez-Paniagua
-
Isabel Segura-Bedmar, et. al.Isabel Segura-Bedmar ... Víctor Suárez-Paniagua
01 Jan 2015
01 Jan 2015

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition
Iñigo Jauregi Unanue ... Massimo Piccardi
Journal of Biomedical Informatics | VOL. 76
Iñigo Jauregi Unanue, et. al.Iñigo Jauregi Unanue ... Massimo Piccardi
13 Nov 2017
Journal of Biomedical Informatics | VOL. 76

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information