Learning Word Embeddings Research Articles

BackgroundData mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency.MethodsIn this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification.ResultsIn empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks.ConclusionsBased on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.

Read full abstract

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset.

Read full abstract

Learning Word Embeddings Research Articles

Related Topics

Articles published on Learning Word Embeddings

E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding

An experimental study of sentiment classification using deep-based models with various word embedding techniques

Multi-step Transfer Learning in Natural Language Processing for the Health Domain

Analyzing Reddit Data: Hybrid Model for Depression Sentiment using FastText Embedding

Effect of Supervised Sense Disambiguation Model Using Machine Learning Technique and Word Embedding in Word Sense Disambiguation

Flexible margins and multiple samples learning to enhance lexical semantic similarity

An efficient consolidation of word embedding and deep learning techniques for classifying anticancer peptides: FastText+BiLSTM.

Root Cause Identification of Industrial Alarm Floods Using Word Embedding and Few-Shot Learning

Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation

Event causality extraction through external event knowledge learning and polyhedral word embedding

Diverse ensemble classifier driven Email spam classification using multiple word embedding’s with COCOB optimizer

An empirical assessment of different word embedding and deep learning models for bug assignment

An Effective Knowledgeable Label-Aware Approach for Sentential Relation Extraction

Out-of-vocabulary word embedding learning based on reading comprehension mechanism

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Statement Recognition of Access Control Policies in IoT Networks.

Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning

Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning

Artificial intelligence for prediction of International Classification of Disease codes

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Learning Word Embeddings Research Articles

Related Topics

Articles published on Learning Word Embeddings

E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding

An experimental study of sentiment classification using deep-based models with various word embedding techniques

Multi-step Transfer Learning in Natural Language Processing for the Health Domain

Analyzing Reddit Data: Hybrid Model for Depression Sentiment using FastText Embedding

Effect of Supervised Sense Disambiguation Model Using Machine Learning Technique and Word Embedding in Word Sense Disambiguation

Flexible margins and multiple samples learning to enhance lexical semantic similarity

An efficient consolidation of word embedding and deep learning techniques for classifying anticancer peptides: FastText+BiLSTM.

Root Cause Identification of Industrial Alarm Floods Using Word Embedding and Few-Shot Learning

Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation

Event causality extraction through external event knowledge learning and polyhedral word embedding

Diverse ensemble classifier driven Email spam classification using multiple word embedding’s with COCOB optimizer

An empirical assessment of different word embedding and deep learning models for bug assignment

An Effective Knowledgeable Label-Aware Approach for Sentential Relation Extraction

Out-of-vocabulary word embedding learning based on reading comprehension mechanism

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Statement Recognition of Access Control Policies in IoT Networks.

Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning

Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning

Artificial intelligence for prediction of International Classification of Disease codes

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning