Drug Name Recognition Research Articles

Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to existing drug dictionaries immediately after they are developed is also a challenge. In recent years, word embeddings that contain rich latent semantic information of words have been widely used to improve the performance of various natural language processing tasks. However, they have not been used in DNR systems. Compared to the semantic features based on drug dictionaries, the advantage of word embeddings lies in that learning them is unsupervised. In this paper, we investigate the effect of semantic features based on word embeddings on DNR and compare them with semantic features based on three drug dictionaries. We propose a conditional random fields (CRF)-based system for DNR. The skip-gram model, an unsupervised algorithm, is used to induce word embeddings on about 17.3 GigaByte (GB) unlabeled biomedical texts collected from MEDLINE (National Library of Medicine, Bethesda, MD, USA). The system is evaluated on the drug-drug interaction extraction (DDIExtraction) 2013 corpus. Experimental results show that word embeddings significantly improve the performance of the DNR system and they are competitive with semantic features based on drug dictionaries. F-score is improved by 2.92 percentage points when word embeddings are added into the baseline system. It is comparative with the improvements from semantic features based on drug dictionaries. Furthermore, word embeddings are complementary to the semantic features based on drug dictionaries. When both word embeddings and semantic features based on drug dictionaries are added, the system achieves the best performance with an F-score of 78.37%, which outperforms the best system of the DDIExtraction 2013 challenge by 6.87 percentage points.

Read full abstract

BackgroundThe functions of chemical compounds and drugs that affect biological processes and their particular effect on the onset and treatment of diseases have attracted increasing interest with the advancement of research in the life sciences. To extract knowledge from the extensive literatures on such compounds and drugs, the organizers of BioCreative IV administered the CHEMical Compound and Drug Named Entity Recognition (CHEMDNER) task to establish a standard dataset for evaluating state-of-the-art chemical entity recognition methods.MethodsThis study introduces the approach of our CHEMDNER system. Instead of emphasizing the development of novel feature sets for machine learning, this study investigates the effect of various tag schemes on the recognition of the names of chemicals and drugs by using conditional random fields. Experiments were conducted using combinations of different tokenization strategies and tag schemes to investigate the effects of tag set selection and tokenization method on the CHEMDNER task.ResultsThis study presents the performance of CHEMDNER of three more representative tag schemes-IOBE, IOBES, and IOB12E-when applied to a widely utilized IOB tag set and combined with the coarse-/fine-grained tokenization methods. The experimental results thus reveal that the fine-grained tokenization strategy performance best in terms of precision, recall and F-scores when the IOBES tag set was utilized. The IOBES model with fine-grained tokenization yielded the best-F-scores in the six chemical entity categories other than the "Multiple" entity category. Nonetheless, no significant improvement was observed when a more representative tag schemes was used with the coarse or fine-grained tokenization rules. The best F-scores that were achieved using the developed system on the test dataset of the CHEMDNER task were 0.833 and 0.815 for the chemical documents indexing and the chemical entity mention recognition tasks, respectively.ConclusionsThe results herein highlight the importance of tag set selection and the use of different tokenization strategies. Fine-grained tokenization combined with the tag set IOBES most effectively recognizes chemical and drug names. To the best of the authors' knowledge, this investigation is the first comprehensive investigation use of various tag set schemes combined with different tokenization strategies for the recognition of chemical entities.

Read full abstract

Drug Name Recognition Research Articles

Related Topics

Articles published on Drug Name Recognition

Exploration of biomedical knowledge for recurrent glioblastoma using natural language processing deep learning models

Medicine Drug Name Detection Based Object Recognition Using Augmented Reality.

A Novel Genetic Artificial Bee Inspired Neural Network Model for Drug Name Recognition

A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF

Terminologies augmented recurrent neural network model for clinical named entity recognition.

A two-stage deep learning approach for extracting entities and relationships from medical texts.

PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts.

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition

LSTM-CRF for Drug-Named Entity Recognition

A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text.

A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

Drug Name Recognition: Approaches and Resources

Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug–drug interaction extraction and classification

Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

CHEMDNER: The drugs and chemical names extraction challenge.

A document processing pipeline for annotating chemical entities in scientific documents.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Drug Name Recognition Research Articles

Related Topics

Articles published on Drug Name Recognition

Exploration of biomedical knowledge for recurrent glioblastoma using natural language processing deep learning models

Medicine Drug Name Detection Based Object Recognition Using Augmented Reality.

A Novel Genetic Artificial Bee Inspired Neural Network Model for Drug Name Recognition

A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF

Terminologies augmented recurrent neural network model for clinical named entity recognition.

A two-stage deep learning approach for extracting entities and relationships from medical texts.

PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts.

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition

LSTM-CRF for Drug-Named Entity Recognition

A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text.

A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

Drug Name Recognition: Approaches and Resources

Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug–drug interaction extraction and classification

Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

CHEMDNER: The drugs and chemical names extraction challenge.

A document processing pipeline for annotating chemical entities in scientific documents.