Exploring coherent topics by topic modeling with term weighting

Ximing Li,Ang Zhang,Changchun Li,Jihong Ouyang,Yi Cai

doi:10.1016/j.ipm.2018.05.009

Abstract

Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring coherent topics by topic modeling with term weighting

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Jun 1, 2018
Citations: 37

Similar Papers

The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts
Tinghuai Ma ... Bockarie Marah
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Tinghuai Ma, et. al.Tinghuai Ma ... Bockarie Marah
12 Nov 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Topic Modelling with Fuzzy Document Representation
Nadeem Akhtar ... M M Sufyan Beg
-
Nadeem Akhtar, et. al.Nadeem Akhtar ... M M Sufyan Beg
01 Jan 2019
01 Jan 2019

Incorporating Concept Information into Term Weighting Schemes for Topic Models
Huakui Zhang ... Changmeng Zheng
-
Huakui Zhang, et. al.Huakui Zhang ... Changmeng Zheng
01 Jan 2020
01 Jan 2020

BTM: Topic Modeling over Short Texts
Xueqi Cheng ... Yanyan Lan
IEEE Transactions on Knowledge and Data Engineering | VOL. 26
Xueqi Cheng, et. al.Xueqi Cheng ... Yanyan Lan
01 Dec 2014
IEEE Transactions on Knowledge and Data Engineering | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring coherent topics by topic modeling with term weighting

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management