Clustering Tweets Data on Twitter Social Media Using K-Means Method

Dewi Fatmarani Surianto

doi:10.61220/scientist.v1i2.20232

Abstract

Twitter, as a popular social media with millions to billions of global users, stores a wide variety of information. This study focuses on the use of Text Mining to analyze tweet content through the application of clustering techniques, specifically using the K-Means algorithm. The implementation process involves several stages of text processing, including casefolding, tokenizing, stopword removal, and stemming. Feature extraction is performed to provide input for the K-Means algorithm. The clustering evaluation uses the Silhouette coefficient method. The test results show that different K values result in a variation of the silhouette value. In a particular test scenario, a value of K=2 resulted in a silhouette of 0.5000421, K=5 had a value of 0.0501051, and K=9 had a value of 0.501893. From these values, the data structure of the dataset taken can be categorized as medium structure, because the silhouette value is in the range of 0.5 to 0.7. These results show that cluster quality is influenced by the K value, with the silhouette value being the main determinant.

Full Text