Abstract

We introduce Sprite, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing Sprite to be tailored to particular settings. We demonstrate this flexibility by constructing a Sprite-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.

Highlights

  • Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision

  • Abstracts: A set of 957 abstracts from the ACL anthology (97,168 tokens; 8,246 types). These abstracts have previously been analyzed with Factorial LDA (FLDA) (Paul and Dredze, 2012), so we include it here to see if the factored structure that we explore learns similar patterns

  • These results show that SPRITE is capable of recovering similar structures as FLDA, a more specialized model

Read more

Summary

Introduction

Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision. People often have expectations about topics in a given corpus and how they should be structured for a particular task It is crucial for the user experience that topics meet these expectations (Mimno et al, 2011; Talley et al, 2011) yet black box topic models provide no control over the desired output. After describing the general form of the model, we show how SPRITE can be tailored to particular settings by describing a specific model for the applied task of jointly inferring topic hierarchies and perspective (§6). We experiment with this topic+perspective model on sets of political debates and online reviews (§7), and demonstrate that SPRITE learns desired structures while outperforming many baselines at predictive tasks

Topic Modeling with Structured Priors
Topic Structures
Directed Acyclic Graph
Factored Forest
Tying Topic and Document Components
Deep Components
Special Cases and Extensions
Latent Dirichlet Allocation
Shared Components Topic Models
Factored Topic Models
Topic Hierarchies and Correlations
Conditioning on Document Attributes
Inference and Parameter Estimation
Tightening the Constraints
A Factored Hierarchical Model of Topic and Perspective
Datasets and Experimental Setup
Analysis of Output
Quantitative Evaluation
Structure Comparison
Related Work
Findings
Discussion and Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call