Clustering Model for Microblogging Sites using Dimension Reduction Techniques

Soumi Dutta,Asit Kumar Das,Nilan Saha,Saptarshi Ghosh

doi:10.4018/ijismd.2019040102

Abstract

In recent times, microblogging sites such as Twitter have become popular communication platforms for exchanging information. From the point of view of individual user, a reasonably active Twitter user can easily get hundreds of microblogs (tweets) in his/her timeline every day. In addition, a large number of the tweets contain fundamentally the same information, because of retweeting and re-posting. These huge amounts of repetitive data may cause data over-burden for the users, and no user can effectively process so much data. In this situation, methodologies to manage the data over-burden should be developed. One of the effective methods for managing the data over-burden on Twitter is to cluster semantically similar tweets into groups, with the goal that a user may see just a couple of tweets in each group. In this work, various graph clustering approaches based on dimension reduction are proposed for clustering microblogs. Through experiments on several microblog datasets, the authors demonstrate that the proposed techniques perform better than several classical text clustering algorithms.

Full Text