Abstract

Topic modelling is a technique to infer themes and topic from a large collection of documents. Latent Dirichlet allocation is the most widely technique used in topic modelling literature. It is a model generative in nature with multinomial distribution to produce document, and then, again LDA is used as reverse process by estimating parameters to deduce topic and themes from unstructured documents. In topic modelling, many approximate posterior inference algorithms exist, and the most dominating inference techniques in LDA (latent Dirichlet allocation) are variational expectation maximization (VEM) and Gibbs sampling. In this paper, we are evaluating the performance of VEM and Gibbs sampling techniques on an Associated Press data set and Accepted Papers data set by fitting the topic model using latent Dirichlet allocation. In this experiment, we consider perplexity and entropy as significant metrics for the performance evaluation of topic models. In this, we found that for large data set like Associated Press data set with 2000 documents, variational inference is good inference technique and for small data set like Accepted Papers Gibbs sampling is the best choice for inference. Another advantage of Gibbs sampling is that it runs Markov chain and avoids getting trapped in local minima. Variational inference provides fast and deterministic solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.