HTM

Congkai Sun,Zhenfu Cao,Bin Gao,Hang Li

doi:10.3115/1613715.1613779

Abstract

Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HTM

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis
Mohammad Abdul Hadi ... Fatemeh H Fard
-
Mohammad Abdul Hadi, et. al.Mohammad Abdul Hadi ... Fatemeh H Fard
01 Sep 2020
01 Sep 2020

Topic Discovery for Streaming Short Texts with CTM
Yunfeng Xu ... Junhui Deng
-
Yunfeng Xu, et. al.Yunfeng Xu ... Junhui Deng
01 Jul 2018
01 Jul 2018

Satellite Recognition via Sparse Coding Based Probabilistic Latent Semantic Analysis
Danpei Zhao ... Xuguang Zhang
International Journal of Humanoid Robotics | VOL. 11
Danpei Zhao, et. al.Danpei Zhao ... Xuguang Zhang
01 Jun 2014
International Journal of Humanoid Robotics | VOL. 11

Supervised probabilistic latent semantic analysis with applications to controversy analysis of legislative bills
Eyor Alemayehu ... Yi Fang
Intelligent Data Analysis | VOL. 28
Eyor Alemayehu, et. al.Eyor Alemayehu ... Yi Fang
03 Feb 2024
Intelligent Data Analysis | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HTM

Abstract

Talk to us

Similar Papers