Stochastic collapsed variational Bayesian inference for biterm topic model

Narutaka Awaya,Jun Kitazono,Toshiaki Omori,Seiichi Ozawa

doi:10.1109/ijcnn.2016.7727629

Abstract

It is useful for many applications to find out meaningful topics from short texts, such as tweets and comments on websites. Since directly applying conventional topic models (e.g., LDA) to short texts often produces poor results, as a general approach to short texts, a biterm topic model (BTM) was recently proposed. However, the original BTM implementation uses collapsed Gibbs sampling (CGS) for its inference, which requires many iterations over the entire dataset. On the other hand, for LDA, there have been proposed many fast inference algorithms throughout the decade. Among them, a recently proposed stochastic collapsed variational Bayesian inference (SCVB0) is promising because it is applicable to an online setting and takes advantage of the collapsed representation, which results in an improved variational bound. Applying the idea of SCVB0, we develop a fast one-pass inference algorithm for BTM, which can be used to analyze large-scale general short texts and is extensible to an online setting. To evaluate the performance of the proposed algorithm, we conducted several experiments using short texts on Twitter. Experimental results showed that our algorithm found out meaningful topics significantly faster than the original algorithm.

Full Text