SaberLDA

Kaiwei Li,Jun Zhu,Jianfei Chen,Wenguang Chen

doi:10.1145/3037697.3037740

Abstract

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics.In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SaberLDA

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

SaberLDA
Kaiwei Li ... Wenguang Chen
ACM SIGARCH Computer Architecture News | VOL. 45
Kaiwei Li, et. al.Kaiwei Li ... Wenguang Chen
04 Apr 2017
ACM SIGARCH Computer Architecture News | VOL. 45

SaberLDA
Kaiwei Li ... Wenguang Chen
ACM SIGPLAN Notices | VOL. 52
Kaiwei Li, et. al.Kaiwei Li ... Wenguang Chen
04 Apr 2017
ACM SIGPLAN Notices | VOL. 52

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Kaiwei Li ... Wenguang Chen
IEEE Transactions on Parallel and Distributed Systems | VOL. 31
Kaiwei Li, et. al.Kaiwei Li ... Wenguang Chen
01 Sep 2020
IEEE Transactions on Parallel and Distributed Systems | VOL. 31

Models, Inference, and Implementation for Scalable Probabilistic Models of Text

-

01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SaberLDA

Abstract

Talk to us

Similar Papers