Abstract

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.

Highlights

  • Clustering consists of partitioning a given dataset into various groups based on some similarity metric, such as the Euclidean distance, L1 norm, L2 norm, L∞ norm, the popular logarithmic loss measure, or others

  • A key aspect is how to design a latent space that is amenable to accurate low-complexity unsupervised clustering, i.e., one that preserves only those features of the observed high-dimensional data that are useful for clustering while removing all redundant or non-relevant information

  • We provide a general cost function for the problem of the unsupervised clustering that we study here based on the variational Information Bottleneck (IB) framework; and we show that it generalizes the Evidence Lower Bound (ELBO) bound developed in [19]

Read more

Summary

Introduction

Clustering consists of partitioning a given dataset into various groups (clusters) based on some similarity metric, such as the Euclidean distance, L1 norm, L2 norm, L∞ norm, the popular logarithmic loss measure, or others. A key aspect is how to design a latent space that is amenable to accurate low-complexity unsupervised clustering, i.e., one that preserves only those features of the observed high-dimensional data that are useful for clustering while removing all redundant or non-relevant information. In order to achieve the outperforming accuracy: (i) we derive a cost function that contains the IB hyperparameter s that controls optimal trade-offs between the accuracy and regularization of the model; (ii) we use a lower bound approximation for the KL term in the cost function, that does not depend on the clustering assignment probability (note that the clustering assignment is usually not accurate in the beginning of the training process); and (iii) we tune the hyperparameter s by following an annealing approach that improves both the convergence and the accuracy of the proposed algorithm.

Proposed Model
Inference Network Model
Generative Network Model
Proposed Method
Brief Review of Variational Information Bottleneck for Unsupervised Learning
Proposed Algorithm
Effect of the Hyperparameter
Description of the Datasets Used
Network Settings and Other Parameters
Clustering Accuracy
Visualization on the Latent Space
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call