Distributed Word Representations Research Articles

Due to the increasing use of information technologies by biomedical experts, researchers, public health agencies, and healthcare professionals, a large number of scientific literatures, clinical notes, and other structured and unstructured text resources are rapidly increasing and being stored in various data sources like PubMed. These massive text resources can be leveraged to extract valuable knowledge and insights using machine learning techniques. Recent advancement in neural network-based classification models has gained popularity which takes numeric vectors (aka word representation) of training data as the input to train classification models. Better the input vectors, more accurate would be the classification. Word representations are learned as the distribution of words in an embedding space, wherein each word has its vector and the semantically similar words based on the contexts appear nearby each other. However, such distributional word representations are incapable of encapsulating relational semantics between distant words. In the biomedical domain, relation mining is a well-studied problem which aims to extract relational words, which associates distant entities generally representing the subject and object of a sentence. Our goal is to capture the relational semantics information between distant words from a large corpus to learn enhanced word representation and employ the learned word representation for various natural language processing tasks such as text classification. In this article, we have proposed an application of biomedical relation triplets to learn word representation through incorporating relational semantic information within the distributional representation of words. In other words, the proposed approach aims to capture both distributional and relational contexts of the words to learn their numeric vectors from text corpus. We have also proposed an application of the learned word representations for text classification. The proposed approach is evaluated over multiple benchmark datasets, and the efficacy of the learned word representations is tested in terms of word similarity and concept categorization tasks. Our proposed approach provides better performance in comparison to the state-of-the-art GloVe model. Furthermore, we have applied the learned word representations to classify biomedical texts using four neural network-based classification models, and the classification accuracy further confirms the effectiveness of the learned word representations by our proposed approach.

Read full abstract

BackgroundEach year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early warning data. However, proper and timely warnings are hindered by a 1- to 2-week delay between the actual disease outbreaks and the publication of surveillance data. To address the issue, novel methods for influenza surveillance and prediction using real-time internet data (such as search queries, microblogging, and news) have been proposed. Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data.ObjectiveIn this paper, we propose an effective method to extract training data in a manner that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction.MethodsAlthough word embedding provides a distributed representation of words by encoding the hidden relationships between various tokens, we enhanced the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient in order to solely keep the tokens with high correlation with the actual influenza outbreak. The keyword extraction process was followed by a predictive model based on long short-term memory that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we used and compared a variety of word embedding techniques.ResultsWord embedding without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. Conversely, word embedding using our proposed sorting process showed 0.8868 prediction accuracy and an improvement in prediction accuracy of 12.6%, although smaller amounts of training data were selected, with only 20.6 keywords on average.ConclusionsThe sorting stage empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperformed other current approaches that use flat extraction before prediction.

Read full abstract

Distributed Word Representations Research Articles

Related Topics

Articles published on Distributed Word Representations

A comprehensive analysis of static word embeddings for Turkish

Exploring Latent Themes-Analysis of various Topic Modelling Algorithms

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.

Refining Word Embeddings with Sentiment Information for Sentiment Analysis

LARQS: An Analogical Reasoning Evaluation Dataset for Legal Word Embedding

Refining electronic medical records representation in manifold subspace

Extracting value from Brazilian Court decisions

A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset

A Classification Framework of Identifying Major Documents with Search Engine Suggests and Unsupervised Subtopic Clustering

Learning word representation by jointly using neighbor and syntactic contexts

Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study.

A Classification Framework of Identifying Major Documents With Search Engine Suggestions and Unsupervised Subtopic Clustering

Pre-training Context and Time Aware Location Embeddings from Spatial-Temporal Trajectories for User Next Location Prediction

Probing Lexical Ambiguity: Word Vectors Encode Number and Relatedness of Senses.

Efficient estimation of Hindi WSD with distributed word representation in vector space

Improving biomedical word representation with locally linear embedding

Word Embeddings for Constructive Comments Classification

Distributional Word Representations for Code-mixed Text in Moroccan Darija

Example Phrase Adaptation Method for Customized, Example-Based Dialog System Using User Data and Distributed Word Representations

Unsupervised Predominant Sense Detection and Its Application to Text Classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Word Representations Research Articles

Related Topics

Articles published on Distributed Word Representations

A comprehensive analysis of static word embeddings for Turkish

Exploring Latent Themes-Analysis of various Topic Modelling Algorithms

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.

Refining Word Embeddings with Sentiment Information for Sentiment Analysis

LARQS: An Analogical Reasoning Evaluation Dataset for Legal Word Embedding

Refining electronic medical records representation in manifold subspace

Extracting value from Brazilian Court decisions

A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset

A Classification Framework of Identifying Major Documents with Search Engine Suggests and Unsupervised Subtopic Clustering

Learning word representation by jointly using neighbor and syntactic contexts

Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study.

A Classification Framework of Identifying Major Documents With Search Engine Suggestions and Unsupervised Subtopic Clustering

Pre-training Context and Time Aware Location Embeddings from Spatial-Temporal Trajectories for User Next Location Prediction

Probing Lexical Ambiguity: Word Vectors Encode Number and Relatedness of Senses.

Efficient estimation of Hindi WSD with distributed word representation in vector space

Improving biomedical word representation with locally linear embedding

Word Embeddings for Constructive Comments Classification

Distributional Word Representations for Code-mixed Text in Moroccan Darija

Example Phrase Adaptation Method for Customized, Example-Based Dialog System Using User Data and Distributed Word Representations

Unsupervised Predominant Sense Detection and Its Application to Text Classification