Abstract

Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to address the problem. The ultimate goal of topic labeling techniques are to assign interpretable labels for the learned topics. In this paper, we are taking concepts of ontology into consideration instead of words alone to improve the quality of generated labels for each topic. Our work is different in comparison with the previous efforts in this area, where topics are usually represented with a batch of selected words from topics. We have highlighted some aspects of our approach including: 1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words; and 2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. The best topic labels are selected with respect to the semantic similarity of the concepts and their ontological categorizations. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics.

Highlights

  • Probabilistic topic models such as Latent Dirichlet Allocation (LDA) [1] has been getting considerable attention

  • We first presented our Knowledge-based topic model, KB-LDA model, in [11] where we showed that incorporating ontological concepts with topic models improves the quality of topic labeling

  • Our contributions in this work are as follows: 1) In a very high level, we propose a Knowledge-based topic model, namely, KB-LDA, which integrates an ontology as a knowledge base into the statistical topic models in a principled way

Read more

Summary

A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling

We have highlighted some aspects of our approach including: 1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words; and 2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics

INTRODUCTION
BACKGROUND
Probabilistic Topic Models
RELATED WORK
PROBLEM FORMULATION
The KB-LDA Topic Model
Topic Label Graph Extraction
Results
VIII. CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call