Abstract

Ever since COVID-19 was declared a pandemic, governments around the world have implemented numerous phases of lockdown measures to curb the spread of the virus. These lockdown tactics manifest themselves in the form of widespread fear and panic driven by social media discussions. Given that individuals hold diverse opinions about these lockdown measures during and after their completion, positive and negative lockdown-related discussions should be differentiated to further understand the major related issues and to make appropriate messaging and policy choices in the future. We conduct a sentiment analysis (SA) of COVID-19-lockdown-related tweets by using different machine learning (ML) classifiers and then evaluate their performance before and after using the synthetic minority oversampling technique (SMOTE). This research is performed in five phases, starting with data collection and followed by pre-processing the dataset, preparing the dataset by annotation, applying SMOTE and using ML classifiers. We observe an improvement in accuracy ( ) as confirmed by the Matthew correlation coefficient ( ) across most classifiers, except for the k-nearest neighbour (KNN), whose Acc decreased from 0.82 to 0.59 and MCC decreased from 0.544 to 0.279 before and after SMOTE was applied. Despite the potential of SMOTE with some classifiers, this technique cannot be considered an ultimate solution, especially with other classifiers and datasets. The study provides insights into the need to evaluate and benchmark the integration of data balancing approaches with ML classifiers in addition to considering additional metrics, such as MCC, for binary classification problems, especially in SA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call