DBSCAN algorithm: twitter text clustering of trend topic pilkada pekanbaru

Mustakim Mustakim,Reza Nurul Gayatri Indah,Tuti Andriani,Robbi Rahim,Suwanto Sanjaya,Yulia Novita,Hasbullah Hasbullah,Wardani Purnama Sari,Rian Vebrianto,Oktaf Brillian Kharisma,Rice Novita

doi:10.1088/1742-6596/1363/1/012001

Abstract

Social media is one of the most common sources used to communicate, such as Twitter. Every tweet on Twitter contains data such as text which when collected can be processed into information. Data processed from Twitter tweet will create a trend which can be used for information such as in education, economics, politics, etc. This then created the concept of text mining. Text mining techniques are needed to find an interesting pattern in search of trends based on Twitter text with topics related to Pilkada Pekanbaru 2017. This research is intended to cluster Twitter text data using Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm. This research was conducted with several experiments using different Eps and MinPts parameters for 2,184 text data which has been through several stages, such as cleaning, duplication removal, pre-processing like stemming and stopwords. Based on the highest average of Silhouette Index, Eps 0.1 and MinPts 10 with SI = 0.413 were chosen as paramaters, thus forming 31 clusters. According to the frequency of word occurrences in the cluster, the highest are “kpu”, followed by “firdaus”, “kota”, “pasang”, and “ayat”. As can be seen that the candidate pairs most often appear on cluster results are Firdaus-Ayat, and based on the results of Pilkada 2017, Firdaus-Ayat was chosen as Mayor and Vice Mayor of Pekanbaru.

Full Text