Sys-TM: A Fast and General Topic Modeling System

Yingxia Shao,Bin Cui,Yiru Chen,Lele Yu,Xupeng Li

doi:10.1109/tkde.2019.2956518

Abstract

Topic models, such as LDA and its variants, are popular probabilistic models for discovering the abstract “topics” that occur in a collection of documents. However, the performance of topic models may vary a lot for different workloads, and it is not a trivial task to achieve a well-optimized implementation. In this paper, we systematically study all recently proposed samplers over LDA: AliasLDA, F+LDA, LightLDA, and WarpLDA, and discover a novel system tradeoff by considering the diversity and skewness of workloads. Then, we propose a hybrid sampler which can cleverly choose an efficient sampler with the tradeoff, and apply the hybrid sampler to LDA and its variants, including STM, TOT and CTM. Finally, we build a fast and general topic modeling system Sys-TM, which provides a unified topic modeling framework by integrating the hybrid sampler. Based on our empirical studies, the hybrid sampler outperforms the state-of-the-art samplers by up to 2× 2× over various topic models, and with carefully engineered implementation, Sys-TM is able to outperform the existing systems by up to 10× 10×.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jun 1, 2021
Citations: 36	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Sys-TM: A Fast and General Topic Modeling System

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Similar Papers

Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA
Yue Lu ... Qiaozhu Mei
Information Retrieval | VOL. 14
Yue Lu, et. al.Yue Lu ... Qiaozhu Mei
05 Aug 2010
Information Retrieval | VOL. 14

LDA-Based Unified Topic Modeling for Similar TV User Grouping and TV Program Recommendation.
Shinjee Pyo ... Eunhui Kim
IEEE Transactions on Cybernetics | VOL. 45
Shinjee Pyo, et. al. Shinjee Pyo ... Eunhui Kim
01 Oct 2014
IEEE Transactions on Cybernetics | VOL. 45

A data-driven analysis to determine the optimal number of topics 'K' for latent Dirichlet allocation model
Astha Goyal ... Indu Kashyap
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 35
Astha Goyal, et. al.Astha Goyal ... Indu Kashyap
01 Jul 2024
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 35

A Unified Framework for Monolingual and Cross-Lingual Relevance Modeling Based on Probabilistic Topic Models
Ivan Vulić ... Marie-Francine Moens
-
Ivan Vulić, et. al.Ivan Vulić ... Marie-Francine Moens
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sys-TM: A Fast and General Topic Modeling System

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering