Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks

Canlin Zhang,Daniel Biś,Xiuwen Liu,Zhe He

doi:10.1186/s12859-019-3079-8

Canlin Zhang, Daniel Biś + Show 2 more

Open Access

https://doi.org/10.1186/s12859-019-3079-8

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2019
Citations: 20	License type: open-access

Affiliation: Florida State University

Abstract

BackgroundIn recent years, deep learning methods have been applied to many natural language processing tasks to achieve state-of-the-art performance. However, in the biomedical domain, they have not out-performed supervised word sense disambiguation (WSD) methods based on support vector machines or random forests, possibly due to inherent similarities of medical word senses.ResultsIn this paper, we propose two deep-learning-based models for supervised WSD: a model based on bi-directional long short-term memory (BiLSTM) network, and an attention model based on self-attention architecture. Our result shows that the BiLSTM neural network model with a suitable upper layer structure performs even better than the existing state-of-the-art models on the MSH WSD dataset, while our attention model was 3 or 4 times faster than our BiLSTM model with good accuracy. In addition, we trained “universal” models in order to disambiguate all ambiguous words together. That is, we concatenate the embedding of the target ambiguous word to the max-pooled vector in the universal models, acting as a “hint”. The result shows that our universal BiLSTM neural network model yielded about 90 percent accuracy.ConclusionDeep contextual models based on sequential information processing methods are able to capture the relative contextual information from pre-trained input word embeddings, in order to provide state-of-the-art results for supervised biomedical WSD tasks.

Highlights

In recent years, deep learning methods have been applied to many natural language processing tasks to achieve state-of-the-art performance
We provide two approaches: a word sense disambiguation (WSD) method based on a bi-directional long short-term memory (BiLSTM) neural network, and a WSD method based on the self-attention model introduced in [18]
Discussion we provide further analysis and comparison on WSD methods based on the BiLSTM neural network model and the self-attention model

Summary

Introduction

Deep learning methods have been applied to many natural language processing tasks to achieve state-of-the-art performance. In the biomedical domain, they have not out-performed supervised word sense disambiguation (WSD) methods based on support vector machines or random forests, possibly due to inherent similarities of medical word senses. Similar to natural language processing (NLP) in the general domain, knowledge discovery and information extraction require specialized tasks such as syntactic parsing, named entity recognition (NER), and. The Unified Medical Language System (UMLS), which consists of over 200 biomedical terminologies and ontologie, has more than ten million terms and three million concepts. The terms with the same meaning are mapped to the same concepts. Biomedical NER is usually realized by correctly recognizing and mapping an entity mentioned in the sentence to a concept in the

Methods

Results

Discussion

Conclusion