Contextualized Word Embeddings Research Articles

Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments’ success; they also allow ordinary people to express their concerns about a disease or disorder. The Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from posts that enable it to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI models from accurately encoding these posts’ contextual knowledge. We propose a hybrid approach where knowledge encoded in community-maintained knowledge graphs (e.g., Wikidata) is combined with deep learning to categorize social media posts using existing classification models. The proposed approach resorts to state-of-the-art named entity recognizers and linkers (e.g., Falcon 2.0) to extract entities in short posts and link them to concepts in knowledge graphs. Then, knowledge graph embeddings (KGEs) are utilized to compute latent representations of the extracted entities, which result in vector representations of the posts that encode these entities’ contextual knowledge extracted from the knowledge graphs. These KGEs are combined with contextualized word embeddings (e.g., BERT) to generate a context-based representation of the posts that empower prediction models. We apply our proposed approach in the health domain to detect whether a publication is related to an eating disorder (e.g., anorexia or bulimia) and uncover concepts within the discourse that could help healthcare providers diagnose this type of mental disorder. We evaluate our approach on a dataset of 2,000 tweets about eating disorders. Our experimental results suggest that combining contextual knowledge encoded in word embeddings with the one built from knowledge graphs increases the reliability of the predictive models. The ambition is that the proposed method can support health domain experts in discovering patterns that may forecast a mental disorder, enhancing early detection and more precise diagnosis towards personalized medicine.

Read full abstract

PurposeWhen a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.Design/methodology/approachThis study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.FindingsThe results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.Research limitations/implicationsThe proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.Originality/valueThis study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Read full abstract

Contextualized Word Embeddings Research Articles

Related Topics

Articles published on Contextualized Word Embeddings

Personalized Query Expansion with Contextual Word Embeddings

Shahmukhi named entity recognition by using contextualized word embeddings

Natural language processing (NLP) aided qualitative method in health research

Multilabel Classification for Keyword Determination of Scientific Articles

The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation

ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia

A benchmark for evaluating Arabic contextualized word embedding models

ADEPT: A DEbiasing PrompT Framework

Methods of Annotating and Identifying Metaphors in the Field of Natural Language Processing

Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

A comparative analysis of text representation, classification and clustering methods over real project proposals

Data-driven dependency parsing of Vedic Sanskrit

Examining the effect of whitening on static and contextualized word embeddings

Semantic Role Labeling for Amharic Text Using Multiple Embeddings and Deep Neural Network

The VNNLI - VLSP 2021: Leveraging Contextual Word Embedding for NLI Task on Bilingual Dataset

A Comprehensive Analysis of Transformer-Deep Neural Network Models in Twitter Disaster Detection

Contextual word embeddings for tabular data search and integration

Temporal disambiguation of relative temporal expressions in clinical texts.

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Contextualized Word Embeddings Research Articles

Related Topics

Articles published on Contextualized Word Embeddings

Personalized Query Expansion with Contextual Word Embeddings

Shahmukhi named entity recognition by using contextualized word embeddings

Natural language processing (NLP) aided qualitative method in health research

Multilabel Classification for Keyword Determination of Scientific Articles

The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation

ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia

A benchmark for evaluating Arabic contextualized word embedding models

ADEPT: A DEbiasing PrompT Framework

Methods of Annotating and Identifying Metaphors in the Field of Natural Language Processing

Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

A comparative analysis of text representation, classification and clustering methods over real project proposals

Data-driven dependency parsing of Vedic Sanskrit

Examining the effect of whitening on static and contextualized word embeddings

Semantic Role Labeling for Amharic Text Using Multiple Embeddings and Deep Neural Network

The VNNLI - VLSP 2021: Leveraging Contextual Word Embedding for NLI Task on Bilingual Dataset

A Comprehensive Analysis of Transformer-Deep Neural Network Models in Twitter Disaster Detection

Contextual word embeddings for tabular data search and integration

Temporal disambiguation of relative temporal expressions in clinical texts.

Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification.