Learning author-topic models from text corpora

Michal Rosen-Zvi,Chaitanya Chemudugunta,Mark Steyvers,Padhraic Smyth,Thomas Griffiths

doi:10.1145/1658377.1658381

Abstract

We propose an unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1740 papers from the Neural Information Processing Systems (NIPS) Conferences, and 121,000 emails from the Enron corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based on perplexity scores for test documents and precision-recall for document retrieval are used to illustrate systematic differences between the proposed author-topic model and a number of alternatives. Extensions to the model, allowing for example, generalizations of the notion of an author, are also briefly discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning author-topic models from text corpora

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: Jan 1, 2010
Citations: 316

Similar Papers

Probabilistic author-topic models for information discovery
Mark Steyvers ... Michal Rosen-Zvi
-
Mark Steyvers, et. al.Mark Steyvers ... Michal Rosen-Zvi
22 Aug 2004
22 Aug 2004

A author topic model based unsupervised algorithm for learning topics from large text collections
S Mercy Shalinie ... S Pushparathi
-
S Mercy Shalinie, et. al.S Mercy Shalinie ... S Pushparathi
01 Jun 2011
01 Jun 2011

A Hierarchical Bayesian Model for Text Corpora
Peng Han ... Ying Nan Zhang
Applied Mechanics and Materials | VOL. 687-691
Peng Han, et. al.Peng Han ... Ying Nan Zhang
01 Nov 2014
Applied Mechanics and Materials | VOL. 687-691

Unsupervised Concept Hierarchy Learning: A Topic Modeling Guided Approach
V.S Anoop ... P Deepak
Procedia Computer Science | VOL. 89
V.S Anoop, et. al.V.S Anoop ... P Deepak
01 Jan 2015
Procedia Computer Science | VOL. 89

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning author-topic models from text corpora

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems