DiscWord: Learning Discriminative Topics

Yu Jiang,Xian Li,Weiyi Meng

doi:10.1109/wi-iat.2014.81

Abstract

Topic modeling is a popular research topic and is widely used in text mining based applications. Many researchers realize that the learned topics in the LDA model, each as a multinomial distribution on the word vocabulary space, are often not intuitive in term of human recognition and communication. Based on our observation, given a topic, the most frequent words in it are usually less important than some words that are dedicated to it. In this paper, aiming at learning discriminative topics, we introduce a measure named word discriminability to capture a word's ability to identify different topics, and propose an iterative algorithm that is able to train and utilize word discriminability information during the topic learning process. Experimental results show that applying our method on the LDA topic model can improve its document classification accuracy significantly, the learned topics are more discriminative, and the top words of a topic are usually more representative.

Full Text