User interest modeling by labeled LDA with topic features

Wenfeng Li,Rile Hu,Xiaojie Wang,Jilei Tian

doi:10.1109/ccis.2011.6045022

Abstract

As well known, the user interest is carried in the user's web browsing history that can be mined out. This paper presents an innovative method to extract user's interests from his/her web browsing history. We first apply an efficient algorithm to extract useful texts from the web pages in user's browsed URL sequence. We then proposed a Labeled Latent Dirichlet Allocation with Topic Feature (LLDA-TF) to mine user's interests from the texts. Unlike other works that need a lot of training data to train a model to adopt supervised information, we directly introduce the raw supervised information to the procedure of LLDA-TF. As shown in the experimental results, results given by LLDA-TF fit predefined categories well. Furthermore, LLDA-TF model can name the user interests by category words as well as a keyword list for each category.

Full Text