Abstract

Twitter, as a popular social media with millions to billions of global users, stores a wide variety of information. This study focuses on the use of Text Mining to analyze tweet content through the application of clustering techniques, specifically using the K-Means algorithm. The implementation process involves several stages of text processing, including casefolding, tokenizing, stopword removal, and stemming. Feature extraction is performed to provide input for the K-Means algorithm. The clustering evaluation uses the Silhouette coefficient method. The test results show that different K values result in a variation of the silhouette value. In a particular test scenario, a value of K=2 resulted in a silhouette of 0.5000421, K=5 had a value of 0.0501051, and K=9 had a value of 0.501893. From these values, the data structure of the dataset taken can be categorized as medium structure, because the silhouette value is in the range of 0.5 to 0.7. These results show that cluster quality is influenced by the K value, with the silhouette value being the main determinant.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call