WarpLDA

Jianfei Chen,Jun Zhu,Kaiwei Li,Wenguang Chen

doi:10.14778/2977797.2977801

Abstract

Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an O (1) Metropolis-Hastings (MH) sampling method for each token. However, its performance is far from being optimal due to frequent cache misses caused by random accesses to the parameter matrices. In this paper, we first carefully analyze the memory access behavior of existing algorithms for LDA by cache locality at document level. We then develop WarpLDA, which achieves O (1) time complexity per-token and fits the randomly accessed memory perdocument in the L3 cache. Our empirical results in a wide range of testing conditions demonstrate that WarpLDA is consistently 5-15x faster than the state-of-the-art MH-based LightLDA, and is faster than the state-of-the-art sparsity aware F+LDA in most settings. Our WarpLDA learns a million topics from 639 millions of documents in only five hours at an unprecedented throughput of 11 billion tokens per second.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WarpLDA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Jun 1, 2016
Citations: 58

Similar Papers

A Brief Note on DocumentSummarization

-

01 Aug 2020
01 Aug 2020

Exploring Latent Dirichlet Allocation (LDA) in Topic Modeling: Theory, Applications, and Future Directions
Ugorji C Calistus ... Moses O Onyesolu
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4
Ugorji C Calistus, et. al.Ugorji C Calistus ... Moses O Onyesolu
11 Mar 2024
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4

Victim Replication
Michael Zhang ... Krste Asanovic
ACM SIGARCH Computer Architecture News | VOL. 33
Michael Zhang, et. al.Michael Zhang ... Krste Asanovic
01 May 2005
ACM SIGARCH Computer Architecture News | VOL. 33

Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
M Zhang ... K Asanovic
-
M Zhang, et. al.M Zhang ... K Asanovic
28 Jul 2005
28 Jul 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WarpLDA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment