Abstract

Twitter is a popular microblogging service, where users frequently engage in discussions about various topics of interest, ranging from popular topics (e.g., music) to niche topics (e.g., politics). With the large amount of tweets, a key challenge is to automatically model and determine the discussion topics without having prior knowledge of the types and number of topics, or requiring the technical expertise to define various algorithmic parameters. For this purpose, we propose the Clustering-based Topic Modelling (ClusTop) algorithm that constructs various types of word network and automatically determines the discussion topics using community detection approaches. Unlike traditional topic models, ClusTop is able to automatically determine the appropriate number of topics and does not require numerous parameters to be set. The ClusTop algorithm is also able to capture the syntactic meaning in tweets via the use of bigrams, trigrams and other word combinations in constructing the word network graph. Using three Twitter datasets with labelled crises and events as topics, ClusTop has been shown to outperform various baselines in terms of topic coherence, pointwise mutual information, precision, recall and F-score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.