Linked Topic and Interest Model for Web Forums

Victor Cheng,C.H Li

doi:10.1109/wiiat.2008.227

Abstract

In Web forum analysis, both the discussion topics and author interests are greatly concerned. We introduce a linked topic and interest model based on latent Dirichlet allocation (LDA) to explore discussion topics and author interests. Rather than having two separate models or modeling combined topics and interests with just one hidden topic assignment variable, the proposed model has separate but linked hidden variables for topic and interest exploration. As exact model parameter inference is intractable, Gibbs sampling is employed to estimate topic, author, and interest distributions. The joint distribution of the linked hidden variables also provides an interpretation of an interest in terms of weighted topics or vice versa. We apply the model to a NIPS data set and a corpus containing text contents of a popular digital camera Web forum. Topics and interests discovered by using the model is demonstrated. The model generalization capability is also assessed by means of perplexity and the results show that the linked topic and interest model has performance exceeding that of LDA document topic model and author topic model.

Full Text