Abstract

In this era of big data, many domains on the web naturally have massive amount of labeled text data that are growing over time, for example, digital publication archives, social media posts, and question-answer forums. Probabilistic graphical models have shown great potential for mining such text corpora in recent years. Some of these algorithms utilize explicit annotations and labels associated with documents to guide the probabilistic model to find hidden themes. A few techniques attempt to utilize the timestamps associated with documents to model the evolution of those latent topics. However, no effort has been devoted to utilize these two different dimensions of information together — timestamps and labels or annotations — to discover evolution of labeled themes. In this paper, we present a new topical model called the Supervised Topical Evolution Model (STEM), which is a monolithic graphical model capable of using annotations, timestamps, and textual contents to discover interpretable and evolving themes from big text datasets. STEM simultaneously learns latent themes and their changes over time using a stochastic process that is driven by labels or annotations. In addition, we provide an asynchronously distributed inference process for STEM that results in significant speedup in learning time, making the model scalable for large datasets. Extensive experiments demonstrate that our proposed model is able to infer highly interpretable topics that reflect temporal patterns, in much less time than other comparable topic modeling methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.