Exploiting semantic associative information in topic modeling

Meng-Sung Wu,Hung-Shin Lee,Hsin-Min Wang

doi:10.1109/slt.2010.5700883

Abstract

Topic modeling has been widely applied in a variety of text modeling tasks as well as in speech recognition systems for effectively capturing the semantic and statistic information in documents or speech utterances. Most topic models rely on the bag-of-words assumption that results in learned latent topics composed of lists of individual words. Unfortunately, these words may convey topical information but lack accurate semantic knowledge of the text. In this paper, we present the semantic associative topic model, where the concept of the semantic association terms is extended to topic modeling, which provides guidance on modeling the semantic associations that occur among single words by expressing a document as an association of multiple words. Further, the pointwise KL-divergence metric is used to measure the significance of the association. We also integrate original PLSA and SATM models, which have mixed feature representations. Experimental results on WSJ and AP datasets show that the proposed approaches achieved higher performance compared to other methods.

Full Text