Drink2Vec: Improving the classification of alcohol-related tweets using distributional semantics and external contextual enrichment

Marcos Grzeça,Karin Becker,Renata Galante

doi:10.1016/j.ipm.2020.102369

Abstract

The hazardous and harmful use of alcohol has become a public health issue worldwide. Social media has emerged as a reliable source to extract information on alcohol consumption at low cost and latency. The automatic classification of tweets related to alcohol consumption can help to understand the factors related to alcohol abuse. In this paper, we propose Drink2Vec, a method aimed at improving the classification of alcohol-related tweets by exploring two forms of contextual information: distributional semantics and external contextual enrichment. The core of Drink2Vec is a convolutional neural network that learns domain-specific word embedding representations that capture vocabulary related to alcohol consumption. Drink2Vec builds on Drink2Symbol, a method that finds relevant symbolic features on external sources (e.g., Semantic Web) to provide meaning and generalization to the terms present in tweets. Based on five datasets and three classification algorithms, our experiments show that external enrichment improves the recall by the addition of generalization features, while distributional semantics improves the precision mainly by characterizing terms according to their usage. A stacking ensemble of these classifiers establishes a proper balance between the advantages of each contextual enrichment technique. Our experiments also suggest that the task-specific embeddings produced by Drink2Vec capture more nuances of the informal vocabulary related to alcohol consumption (e.g., slangs, events, misspelled words) and yield better results compared to other strategies (e.g., pre-trained embeddings and generic algorithms).

Full Text