A biterm topic model for short texts

Xiaohui Yan,Xueqi Cheng,Yanyan Lan,Jiafeng Guo

doi:10.1145/2488388.2488514

Abstract

Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly capture the document-level word co-occurrence patterns to reveal topics, and thus suffer from the severe data sparsity in short documents. In this paper, we propose a novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole corpus. The major advantages of BTM are that 1) BTM explicitly models the word co-occurrence patterns to enhance the topic learning; and 2) BTM uses the aggregated patterns in the whole corpus for learning topics to solve the problem of sparse word co-occurrence patterns at document-level. We carry out extensive experiments on real-world short text collections. The results demonstrate that our approach can discover more prominent and coherent topics, and significantly outperform baseline methods on several evaluation metrics. Furthermore, we find that BTM can outperform LDA even on normal texts, showing the potential generality and wider usage of the new topic model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A biterm topic model for short texts

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Sparse Biterm Topic Model for Short Texts
Bingshan Zhu ... Huakui Zhang
-
Bingshan Zhu, et. al.Bingshan Zhu ... Huakui Zhang
01 Jan 2020
01 Jan 2020

BTM: Topic Modeling over Short Texts
Xueqi Cheng ... Yanyan Lan
IEEE Transactions on Knowledge and Data Engineering | VOL. 26
Xueqi Cheng, et. al.Xueqi Cheng ... Yanyan Lan
01 Dec 2014
IEEE Transactions on Knowledge and Data Engineering | VOL. 26

GPU-BTM: A Topic Model for Short Text using Auxiliary Information
Yibing Guo ... Yutao Huang
-
Yibing Guo, et. al.Yibing Guo ... Yutao Huang
01 Jul 2020
01 Jul 2020

A Biterm-based Dirichlet Process Topic Model for Short Texts
Li Jing ... Yin Jian
-
Li Jing, et. al.Li Jing ... Yin Jian
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A biterm topic model for short texts

Abstract

Talk to us

Similar Papers