Contributions to Social Media Analysis Based on Topic Modelling

Giordanno Brunno Bergamini Gomes,Romis Attux

doi:10.5753/kdmile.2023.231795

Abstract

This work proposes a computational approach to a task deeply related to human sciences, that of employing natural language processing in text analysis. Researchers working in that field are often faced with the need for extracting information from large masses of textual data. One of such applications is topic modelling, a task that requires the discovery of the topics discussed in texts - to deal with it, there are several available techniques, such as Latent Dirichlet Allocation (LDA), Biterm Topic Model (BTM), Topic Bidirectional Encoder Representation from Transformers (BERTopic) and Non-negative Matrix Factorization (NMF). In this work, we design a methodological setup and perform a comparative analysis of the aforementioned techniques over data retrieved from Twitter. Through this social media, we seek to contribute to the study of political, economic and social issues, as well as to assess the relative merits of topic modelling techniques. The results indicate a higher topic coherence performance for BERTopic, second for NMF, followed by BTM and, lastly, by LDA.

Full Text