Learning Domain-Specific Word Embeddings from COVID-19 Tweets

Steve Aibuedefe Aigbe,Christoph Eick

doi:10.1109/bigdata52589.2021.9671817

Abstract

The COVID-19 global pandemic has been a major catastrophic event that impacted the world’s economy. During the pandemic there was a rise in the use of social media such as Twitter by people to express their reactions and responses to the global pandemic. This drove researchers to analyze these micro-blogging texts, using natural language processing (NLP) methods, to understand information inherent in those texts. Most of these NLP tasks employ the use of word embeddings in training neural network models. These word embeddings are mainly trained on general text corpus which produce sub-optimal performance when used in domain-specific NLP tasks such as in COVID-19 related tweets. In this paper, we present a learned COVID-19 tweets domain-specific word embeddings for use in COVID-19 related tweets NLP tasks. Our evaluation results show that our domain-specific COVID-19 tweets word embeddings perform better than pretrained general word embeddings in a downstream domain-specific NLP task. Our COVID-19 tweets word embeddings are available for use by researchers who wish to perform downstream NLP tasks with pretrained domain-specific COVID-19 tweets word embeddings.

Full Text