Community pooling: LDA topic modeling in Twitter

Esteban Feuerstein,Federico Albanese

doi:10.52591/lxai2021072410

Abstract

Social networks play a fundamental role in propagation of information and news. Characterizing the content of the messages becomes vital for tasks like fake news detection or personalized message recommendation. However, Twitter posts are short and often less coherent than other text documents, which makes it challenging to apply text mining algorithms efficiently. We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community on the retweet network into a single document. Our findings contribute to an improved methodology for identifying the latent topics in a Twitter dataset, without modifying the basic machinery of a topic decomposition model. In particular, we used Latent Dirichlet Allocation (LDA) and empirically showed that this novel method achieves better results than previous pooling methods in terms of cluster quality, document retrieval tasks, supervised machine learning classification and overall run time.

Full Text