Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling

Cangqi Zhou,Yinghua Zhang,Jing Zhang,Hao Ban,Qianmu Li

doi:10.1109/access.2020.3001184

Abstract

Topic models are widely explored for summarizing a corpus of documents. Recent advances in Variational AutoEncoder (VAE) have enabled the development of black-box inference methods for topic modeling in order to alleviate the drawbacks of classical statistical inference. Most existing VAE based approaches assume a unimodal Gaussian distribution for the approximate posterior of latent variables, which limits the flexibility in encoding the latent space. In addition, the unsupervised architecture hinders the incorporation of extra label information, which is ubiquitous in many applications. In this paper, we propose a semi-supervised topic model under the VAE framework. We assume that a document is modeled as a mixture of classes, and a class is modeled as a mixture of latent topics. A multimodal Gaussian mixture model is adopted for latent space. The parameters of the components and the mixing weights are encoded separately. These weights, together with partially labeled data, also contribute to the training of a classifier. The objective is derived under the Gaussian mixture assumption and the semi-supervised VAE framework. Modules of the proposed framework are appropriately designated. Experiments performed on three benchmark datasets demonstrate the effectiveness of our method, comparing to several competitive baselines.

Highlights

Topic models [1], [2] provide us with methods to discover abstract word and phrase patterns that best summarize and characterize a corpus of documents
Variational AutoEncoder (VAE) [16] provides us a framework to alleviate the above-mentioned limitations, by training an inference network that maps the representations of documents to an approximate posterior distribution directly
We study topic modeling under the framework of VAE

Summary

INTRODUCTION

Topic models [1], [2] provide us with methods to discover abstract word and phrase patterns that best summarize and characterize a corpus of documents. Variational AutoEncoder (VAE) [16] provides us a framework to alleviate the above-mentioned limitations, by training an inference network that maps the representations of documents to an approximate posterior distribution directly. We propose a Semi-supervised Variational AutoEncoder with Gaussian Mixture posteriors (S-VAE-GM) to address the above challenges in topic modeling. Each class label, which can be observed for a subset of the data, corresponds to a Gaussian with its specific parameters. This assumption means that topics weigh differently for different classes. The Gaussian mixture model for latent space is rational for the assumption of the class-topic hierarchy in a document, and it alleviates the unimodal limitation. Experiments performed on three standard datasets demonstrate the effectiveness of our model

RELATED WORK

BASIC ASSUMPTIONS

IMPLEMENTATION DETAILS

KLD BETWEEN TWO GMMs

EXPERIMENTAL SETUPS

EVALUATION METRICS

BASELINES

SETTINGS

EXPERIMENTAL RESULTS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts
Michihiro Yasunaga ... John D Lafferty
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 33
Michihiro Yasunaga, et. al.Michihiro Yasunaga ... John D Lafferty
17 Jul 2019
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 33

Monolingual and Cross-Lingual Probabilistic Topic Models and Their Applications in Information Retrieval
Marie-Francine Moens ... Ivan Vulić
-
Marie-Francine Moens, et. al.Marie-Francine Moens ... Ivan Vulić
01 Jan 2013
01 Jan 2013

Augmenting topic models with user relations in context based communication services
V T Babu ... K K Dhara
-
V T Babu, et. al.V T Babu ... K K Dhara
01 Jan 2010
01 Jan 2010

A novel label-based multimodal topic model for social media analysis
Hao Li ... Fan Zhou
Decision Support Systems | VOL. 164
Hao Li, et. al.Hao Li ... Fan Zhou
01 Sep 2022
Decision Support Systems | VOL. 164

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions