Abstract

Topic Modeling has been a useful tool for finding abstract topics (which are collections of words) governing a collection of documents. Each document is then expressed as a collection of generated topics. The most basic topic model is Latent Dirichlet Allocation (LDA). In this paper, we have developed Gibbs Sampling algorithm for Hierarchical Latent Dirichlet Allocation (HLDA) by incorporating time into our topic model. We call our model Hierarchical Latent Dirichlet Allocation with Topic Over Time (HLDA-TOT). We find topics for a collection of songs taken during the period 1990 to 2010. The dataset we used is taken from the Million Songs Dataset (MSD) consisting of a collection of 1000 songs. We have used Gibbs Sampling algorithm for inference in both HLDA and HLDA-TOT. Our experimental results demonstrates a comparison in the performances of HLDA and HLDA-TOT and it is shown that HLDA-TOT performs better in terms of 1) Number of topics generated for different depths 2) Number of empty topics generated for different depths and 3) held-out log likelihood for different depths.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call