A novel method for clustering tweets in Twitter

Shanmugam Poomagal,Palanisamy Visalakshi,Thiagarajan Hamsapriya

doi:10.1504/ijwbc.2015.068540

Abstract

A popular social networking service called Twitter is used to post short messages that could be useful to someone in the world. These messages have been analysed by the researchers in different ways. This paper proposes a clustering technique to cluster the tweets in the Twitter. The basic aim of performing this clustering is to identify the groups of similar tweets posted and this information is useful to identify various user communities. These user communities can be recommended to the advertisers in Twitter by matching their topic of interest with the advertisers' field. Suffix Tree Clustering STC algorithm is the core web documents clustering algorithm which groups similar documents into clusters by constructing suffix tree. We used STC along with semantic similarity among the posted tweets to identify the topics of interest. The proposed method is compared with STC and Lingo algorithms using intra-cluster distance and inter-cluster distance. Results show that the proposed method performs better than the existing methods with 10.59% reduction in the intra-cluster distance value and 44.99% increase in the inter-cluster distance value.

Full Text