Enriching text representation with frequent pattern mining for probabilistic topic modeling

Hyun Duk Kim,Yue Lu,Chengxiang Zhai,Dae Hoon Park

doi:10.1002/meet.14504901209

Abstract

AbstractProbabilistic topic models have been proven very useful for many text mining tasks. Although many variants of topic models have been proposed, most existing works are based on the bag‐of‐words representation of text in which word combination and order are generally ignored, resulting in inaccurate semantic representation of text. In this paper, we propose a general way to go beyond the bag‐of‐words representation for topic modeling by applying frequent pattern mining to discover frequent word patterns that can capture semantic associations between words and then using them as additional supplementary semantic units to augment the conventional bag‐of‐words representation. By viewing a topic model as a generative model for such augmented text data, we can go beyond the bag‐of‐words assumption to potentially capture more semantic associations between words. Since efficient algorithms for mining frequent word patterns are available, this general strategy for improving topic models can be applied to improve any topic models without substantially increasing the computational complexity of the model. Experiment results show that such a frequent pattern‐based data enrichment approach can improve over two representative existing probabilistic topic models for the classification task. We also studied variations of frequent pattern usage in topic modeling and found that using compressed and closed patterns performs best.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enriching text representation with frequent pattern mining for probabilistic topic modeling

Abstract

Talk to us

Similar Papers

More From: Proceedings of the American Society for Information Science and Technology

Lead the way for us

Journal: Proceedings of the American Society for Information Science and Technology	Publication Date: Jan 1, 2012
Citations: 59

Similar Papers

Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss
Fernando Lezama ... Ansel Y Rodríguez-González
Expert Systems With Applications | VOL. 96
Fernando Lezama, et. al.Fernando Lezama ... Ansel Y Rodríguez-González
09 Dec 2017
Expert Systems With Applications | VOL. 96

Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA
Yue Lu ... Chengxiang Zhai
Information Retrieval | VOL. 14
Yue Lu, et. al.Yue Lu ... Chengxiang Zhai
05 Aug 2010
Information Retrieval | VOL. 14

Fault Tolerance Patterns Mining in Dynamic Databases
Guanling Lee ... Delvi Ester
-
Guanling Lee, et. al.Guanling Lee ... Delvi Ester
01 Jan 2015
01 Jan 2015

Mining frequent patterns and association rules using similarities
José Fco Martínez-Trinidad ... José Ruiz-Shulcloper
Expert Systems With Applications | VOL. 40
José Fco Martínez-Trinidad, et. al.José Fco Martínez-Trinidad ... José Ruiz-Shulcloper
27 Jun 2013
Expert Systems With Applications | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enriching text representation with frequent pattern mining for probabilistic topic modeling

Abstract

Talk to us

Similar Papers

More From: Proceedings of the American Society for Information Science and Technology