Abstract

This work proposes a computational approach to a task deeply related to human sciences, that of employing natural language processing in text analysis. Researchers working in that field are often faced with the need for extracting information from large masses of textual data. One of such applications is topic modelling, a task that requires the discovery of the topics discussed in texts - to deal with it, there are several available techniques, such as Latent Dirichlet Allocation (LDA), Biterm Topic Model (BTM), Topic Bidirectional Encoder Representation from Transformers (BERTopic) and Non-negative Matrix Factorization (NMF). In this work, we design a methodological setup and perform a comparative analysis of the aforementioned techniques over data retrieved from Twitter. Through this social media, we seek to contribute to the study of political, economic and social issues, as well as to assess the relative merits of topic modelling techniques. The results indicate a higher topic coherence performance for BERTopic, second for NMF, followed by BTM and, lastly, by LDA.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.