Abstract

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring hidden topics in text corpora. Standard LDA model suffers the problem that the topic assignment of each word is independent and lacks the mechanism to utilize the rich prior background knowledge to learn semantically coherent topics. To address this problem, in this paper, we propose a model called Entity Correlation Latent Dirichlet Allocation (EC-LDA) by incorporating constraints derived from entity correlations as the prior knowledge into LDA topic model. Different from other knowledge-based topic models which extract the knowledge information directly from the train dataset itself or even from the human judgements, for our work, we take advantage of the prior knowledge from the external knowledge base (Freebase 1, in our experiment). Hence, our approach is more suitable to widely kinds of text corpora in different scenarios. We fit our proposed model using Gibbs sampling. Experiment results demonstrate the effectiveness of our model compared with standard LDA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call