Clustering Analysis of Tweets About COVID-19 Using the K-Means Algorithm

Andi Andi,Carles Juliandy,David David

doi:10.33395/sinkron.v8i1.12145

Abstract

One of the trending topics in 2020 to 2022 is tweets about Coronavirus Disease 2019 (COVID-19). A large number of tweets regarding COVID-19 that have appeared have been mixed and not grouped properly, making it difficult for Twitter users to read and sort them based on the information they want. One solution that can be applied to overcome the problems described is through clustering of tweets information about COVID-19. In this study, researchers used quantitative research with the K-Means method, which is one of the clustering methods used in grouping data. The data used in this study is a dataset taken from Kaggle, namely Omicron-Covid-19 Variant Tweets, and also taken through a scraping process with Bright Data with a total of 4,103 datasets. The results showed that determining the best cluster using the Elbow method on the dataset produced empirical evidence that the best cluster was k = 5. The results of grouping tweets regarding COVID-19 using the K-Means Clustering method with k = 5 resulted in the largest number of cluster members being cluster 4 with 1,185 tweets, the second largest was cluster 1 with 1,047 tweets, the third largest was cluster 2 with 757 tweets, the fourth largest was cluster 3 as many as 744 tweets, and the smallest number of cluster members is cluster 5 as many as 370 tweets.

Full Text