Abstract

Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. It requires to be scalable and to be able to account for sparsity and dynamicity of short texts. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or Pitman-Yor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability of DTM. However, these methods basically rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimizations, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation method with graph partitioning to address scalability and dynamicity and uses a rich language model to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art models. In result, GDTM outperforms the best model by 11% on accuracy and performs by an order of magnitude faster while creating four times better topic quality over standard evaluation metrics.

Highlights

  • Motivation topic modeling [1] is the problem of automatic classification of words, which form the context of documents, into similarity groups, known as topics

  • Given a dataset D with n documents, tagged with k hand labels, L = {l1, . . . , lk} and a classification of the documents into k class labels, C = {c1, . . . , ck}, the B-Cubed of a document d with hand label ld and class label cd is calculated as: we demonstrate the accuracy and scalability of GDTM by running the algorithm over two sets of experiments

  • We developed GDTM, a solution for dynamic topic modeling on short texts in online social networks

Read more

Summary

Introduction

Motivation topic modeling [1] is the problem of automatic classification of words, which form the context of documents, into similarity groups, known as topics. Documents generated in today’s social media (like Twitter or Facebook) are (i) fast (large scale and continuous), (ii) sparse (short length) and (iii) dynamic (with constant emergent of newly generated phrases or context structures). This is a problem known as Dynamic Topic. A legitimate solution to DTM should constantly receive a large number of short texts, extract their topics and adapt to the changes in the topics

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.