Sprite: Generalizing Topic Models with Structured Priors

Michael J Paul,Mark Dredze

doi:10.1162/tacl_a_00121

Michael J Paul, Mark Dredze

Open Access

https://doi.org/10.1162/tacl_a_00121

Copy DOI

Abstract

We introduce Sprite, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing Sprite to be tailored to particular settings. We demonstrate this flexibility by constructing a Sprite-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.

Highlights

Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision
Abstracts: A set of 957 abstracts from the ACL anthology (97,168 tokens; 8,246 types). These abstracts have previously been analyzed with Factorial LDA (FLDA) (Paul and Dredze, 2012), so we include it here to see if the factored structure that we explore learns similar patterns
These results show that SPRITE is capable of recovering similar structures as FLDA, a more specialized model

Summary

Introduction

Topic models can be a powerful aid for analyzing large collections of text by uncovering latent interpretable structures without manual supervision. People often have expectations about topics in a given corpus and how they should be structured for a particular task It is crucial for the user experience that topics meet these expectations (Mimno et al, 2011; Talley et al, 2011) yet black box topic models provide no control over the desired output. After describing the general form of the model, we show how SPRITE can be tailored to particular settings by describing a specific model for the applied task of jointly inferring topic hierarchies and perspective (§6). We experiment with this topic+perspective model on sets of political debates and online reviews (§7), and demonstrate that SPRITE learns desired structures while outperforming many baselines at predictive tasks

Topic Modeling with Structured Priors

Topic Structures

Directed Acyclic Graph

Factored Forest

Tying Topic and Document Components

Deep Components

Special Cases and Extensions

Latent Dirichlet Allocation

Shared Components Topic Models

Factored Topic Models

Topic Hierarchies and Correlations

Conditioning on Document Attributes

Inference and Parameter Estimation

Tightening the Constraints

A Factored Hierarchical Model of Topic and Perspective

Datasets and Experimental Setup

Analysis of Output

Quantitative Evaluation

Structure Comparison

Related Work

Findings

Discussion and Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2015
Citations: 54	License type: cc-by

R Discovery Prime

R Discovery Prime

Sprite: Generalizing Topic Models with Structured Priors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Incremental learning from news events
Linmei Hu ... Heng Ji
Knowledge Based Systems | VOL. 89
Linmei Hu, et. al.Linmei Hu ... Heng Ji
11 Sep 2015
Knowledge Based Systems | VOL. 89

Topic Modeling and Sentiment Analysis of Online Review for Airlines
Hye-Jin Kwon ... Hak-Seon Kim
Information | VOL. 12
Hye-Jin Kwon, et. al.Hye-Jin Kwon ... Hak-Seon Kim
12 Feb 2021
Information | VOL. 12

Joint Factorizational Topic Models for Cross-City Recommendation
Lin Xiao ... Zhang Min
-
Lin Xiao, et. al.Lin Xiao ... Zhang Min
01 Jan 2017
01 Jan 2017

RHDP: An Aspect Sharing-Enhanced Hierarchical Topic Model for Multi-Domain Corpus
Yitao Zhang ... Changxuan Wan
ACM transactions on information systems | VOL. 42
Yitao Zhang, et. al.Yitao Zhang ... Changxuan Wan
29 Dec 2023
ACM transactions on information systems | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sprite: Generalizing Topic Models with Structured Priors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics