Building and Testing Fine-Grained Dataset of COVID-19 Tweets for Worry Prediction

Tahani Soud Alharbi,Fethi Fkih

doi:10.14569/ijacsa.2022.0130874

Abstract

The COVID-19 outbreak has resulted in the loss of human life worldwide and has increased worry concerning life, public health, the economy, and the future. With lockdown and social distancing measures in place, people turned to social media such as Twitter to share their feelings and concerns about the pandemic. Several studies have focused on analyzing Twitter users’ sentiments and emotions. However, little work has focused on worry detection at a fine-grained level due to the lack of adequate datasets. Worry emotion is associated with notions such as anxiety, fear, and nervousness. In this study, we built a dataset for worry emotion classification called “WorryCov” . It is a relatively large dataset derived from Twitter concerning worry about COVID-19. The data were annotated into three levels (“no-worry”, “worry”, and “high-worry”). Using the annotated dataset, we investigated the performance of different machine learning algorithms (ML), including multinomial Naïve Bayes (MNB), support vector machine (SVM), logistic regression (LR), and random forests (RF). The results show that LR was the optimal approach, with an accuracy of 75%. Furthermore, the results indicate that the proposed model could be used by psychologists and researchers to predict Twitter users’ worry levels during COVID-19 or similar crises.

Full Text