The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE

Mohd Shahrul Nizam Mohd Danuri,Azzan Amin,Rohizah Abd Rahman,Ibrahim Mohamed

doi:10.1109/icoco56118.2022.10031684

Mohd Shahrul Nizam Mohd Danuri, Azzan Amin + Show 2 more

https://doi.org/10.1109/icoco56118.2022.10031684

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.

Full Text