Abstract
We introduce Sprite, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing Sprite to be tailored to particular settings. We demonstrate this flexibility by constructing a Sprite-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.
Highlights
Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision
Abstracts: A set of 957 abstracts from the ACL anthology (97,168 tokens; 8,246 types). These abstracts have previously been analyzed with Factorial LDA (FLDA) (Paul and Dredze, 2012), so we include it here to see if the factored structure that we explore learns similar patterns
These results show that SPRITE is capable of recovering similar structures as FLDA, a more specialized model
Summary
Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision. People often have expectations about topics in a given corpus and how they should be structured for a particular task It is crucial for the user experience that topics meet these expectations (Mimno et al, 2011; Talley et al, 2011) yet black box topic models provide no control over the desired output. After describing the general form of the model, we show how SPRITE can be tailored to particular settings by describing a specific model for the applied task of jointly inferring topic hierarchies and perspective (§6). We experiment with this topic+perspective model on sets of political debates and online reviews (§7), and demonstrate that SPRITE learns desired structures while outperforming many baselines at predictive tasks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have