ClusTop: A clustering-based topic modelling algorithm for twitter using word networks

Kwan Hui Lim,Aaron Harwood,Shanika Karunasekera

doi:10.1109/bigdata.2017.8258147

Abstract

Twitter is a popular microblogging service, where users frequently engage in discussions about various topics of interest, ranging from popular topics (e.g., music) to niche topics (e.g., politics). With the large amount of tweets, a key challenge is to automatically model and determine the discussion topics without having prior knowledge of the types and number of topics, or requiring the technical expertise to define various algorithmic parameters. For this purpose, we propose the Clustering-based Topic Modelling (ClusTop) algorithm that constructs various types of word network and automatically determines the discussion topics using community detection approaches. Unlike traditional topic models, ClusTop is able to automatically determine the appropriate number of topics and does not require numerous parameters to be set. The ClusTop algorithm is also able to capture the syntactic meaning in tweets via the use of bigrams, trigrams and other word combinations in constructing the word network graph. Using three Twitter datasets with labelled crises and events as topics, ClusTop has been shown to outperform various baselines in terms of topic coherence, pointwise mutual information, precision, recall and F-score.

Full Text