Abstract

Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call