Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

Christoph Weisser,Andre Python,Benjamin Säfken,Arik Reuter,Christoph Gerloff,Anton Thielmann,Thomas Kneib

doi:10.1007/s00180-022-01246-z

Christoph Weisser, Andre Python + Show 5 more

Open Access

https://doi.org/10.1007/s00180-022-01246-z

Copy DOI

Abstract

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Statistics	Publication Date: Jul 9, 2022
Citations: 14	License type: open-access

R Discovery Prime

R Discovery Prime

Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

Abstract

Talk to us

Similar Papers

More From: Computational Statistics

Lead the way for us

Similar Papers

Incorporating Entity Correlation Knowledge into Topic Modeling
Qilin Wang ... Dandan Song
-
Qilin Wang, et. al.Qilin Wang ... Dandan Song
01 Aug 2017
01 Aug 2017

Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics
Kedar S Prabhudesai ... Boyla O Mainsah
-
Kedar S Prabhudesai, et. al.Kedar S Prabhudesai ... Boyla O Mainsah
01 Apr 2018
01 Apr 2018

Sentiment Analysis of Consumer-Generated Online Reviews of Physical Bookstores Using Hybrid LSTM-CNN and LDA Topic Model
Yan Wang ... Xiaoyu Chang
-
Yan Wang, et. al.Yan Wang ... Xiaoyu Chang
01 Oct 2020
01 Oct 2020

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain
Renu Sabharwal ... Shah J Miah
Journal of Big Data | VOL. 9
Renu Sabharwal, et. al.Renu Sabharwal ... Shah J Miah
28 Apr 2022
Journal of Big Data | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

Abstract

Talk to us

Similar Papers

More From: Computational Statistics