Empirical Evaluation of Inference Technique for Topic Models

Pooja Kherwa,Poonam Bansal

doi:10.1007/978-981-13-1708-8_22

Abstract

Topic modelling is a technique to infer themes and topic from a large collection of documents. Latent Dirichlet allocation is the most widely technique used in topic modelling literature. It is a model generative in nature with multinomial distribution to produce document, and then, again LDA is used as reverse process by estimating parameters to deduce topic and themes from unstructured documents. In topic modelling, many approximate posterior inference algorithms exist, and the most dominating inference techniques in LDA (latent Dirichlet allocation) are variational expectation maximization (VEM) and Gibbs sampling. In this paper, we are evaluating the performance of VEM and Gibbs sampling techniques on an Associated Press data set and Accepted Papers data set by fitting the topic model using latent Dirichlet allocation. In this experiment, we consider perplexity and entropy as significant metrics for the performance evaluation of topic models. In this, we found that for large data set like Associated Press data set with 2000 documents, variational inference is good inference technique and for small data set like Accepted Papers Gibbs sampling is the best choice for inference. Another advantage of Gibbs sampling is that it runs Markov chain and avoids getting trapped in local minima. Variational inference provides fast and deterministic solutions.

Full Text