Structured topic models: jointly modeling words and their accompanying modalities

Andrew Mccallum ,Xuerui Wang

doi:10.7275/xdm9-1s71

Abstract

The abundance of data in the information age poses an immense challenge for us: how to perform large-scale inference to understand and utilize this overwhelming amount of information. Such techniques are of tremendous intellectual significance and practical impact. As part of this grand challenge, the goal of my Ph.D. thesis is to develop effective and efficient statistical topic models for massive text collections by incorporating extra information from other modalities in addition to the text itself. Text documents are not just text, and different kinds of additional information are naturally interleaved with text. Most previous work, however, pays attention to only one modality at a time, and ignore the others. In my thesis, I will present a series of probabilistic topic models to show how we can bridge multiple modalities of information, in a united fashion, for various tasks. Interestingly, joint inference over multiple modalities leads to many findings that can not be discovered from just one modality alone, as briefly illustrated below: Email is pervasive nowadays. Much previous work in natural language processing modeled text using latent topics ignoring the social networks. On the other hand, social network research mainly dealt with the existence of links between entities without taking into consideration the language content or topics on those links. The author-recipient-topic (ART) model, by contrast, steers the discovery of topics according to the relationships between people, and learns topic distributions based on the direction-sensitive messages sent between entities. However, the ART model does not explicitly identify groups formed by entities in the network. Previous work in social network analysis ignores the fact that different groupings arise for different topics. The group-topic (GT) model, a probabilistic generative model of entity relationships and textual attributes, simultaneously discovers groups among the entities and topics among the corresponding text. Many of the large datasets do not have static latent structures; they are instead dynamic. The topics over time (TOT) model explicitly models time as an observed continuous variable. This allows TOT to see long-range dependencies in time and also helps avoid a Markov model's risk of inappropriately dividing a topic in two when there is a brief gap in its appearance. By treating time as a continuous variable, we also avoid the difficulties of discretization. Most topic models, including all of the above, rely on the bag of words assumption. However, word order and phrases are often critical to capturing the meaning of text. The topical n-grams (TNG) model discovers topics as well as meaningful, topical phrases simultaneously. In summary, we believe that these models are clear evidence that we can better understand and utilize massive text collections when additional modalities are considered and modeled jointly with text.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Structured topic models: jointly modeling words and their accompanying modalities

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Tutorial on probabilistic topic models

-

19 Dec 2011
19 Dec 2011

A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling
Mehdi Allahyari ... Hamid Reza
International Journal of Advanced Computer Science and Applications | VOL. 8
Mehdi Allahyari, et. al.Mehdi Allahyari ... Hamid Reza
01 Jan 2017
International Journal of Advanced Computer Science and Applications | VOL. 8

Topic modelling with morphologically analyzed vocabularies
Marcus Spies
Scientific Publications of the State University of Novi Pazar Series A: Applied Mathematics, Informatics and mechanics | VOL. 9
Marcus SpiesMarcus Spies
01 Jan 2017
Scientific Publications of the State University of Novi Pazar Series A: Applied Mathematics, Informatics and mechanics | VOL. 9

Probabilistic topic models
David Blei
-
David BleiDavid Blei
21 Aug 2011
21 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Structured topic models: jointly modeling words and their accompanying modalities

Abstract

Talk to us

Similar Papers