Abstract

Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, X and Y, and their latent representation T, take the form of two Markov chains and . Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions . We, therefore, show how to circumvent this limitation by optimising a lower bound for the mutual information between T and Y: , for which only the latter Markov chain has to be satisfied. The mutual information can be split into two non-negative parts. The first part is the lower bound for , which is optimised in deep variational information bottleneck (DVIB) and cognate models in practice. The second part consists of two terms that measure how much the former requirement is violated. Finally, we propose interpreting the family of information bottleneck models as directed graphical models, and show that in this framework, the original and deep information bottlenecks are special cases of a fundamental IB model.

Highlights

  • Deep latent variable models, such as generative adversarial networks [1] and the variational autoencoder (VAE) [2], have attracted much interest in the last few years

  • The derivation of the deep variational information bottleneck model described in Section 2.2 uses the Markov assumption T − X − Y (last line of Equation (9), Figure 1a)

  • We showed how to lift the information bottleneck’s Markov assumption T − X − Y in the context of the deep information bottleneck model, in which X − T − Y holds by construction

Read more

Summary

Introduction

Deep latent variable models, such as generative adversarial networks [1] and the variational autoencoder (VAE) [2], have attracted much interest in the last few years. We clarify this relationship by showing that it is possible to lift the original IB assumption in the context of the deep variational information bottleneck. It can be achieved by optimising a lower bound on the mutual information between T and Y, which follows naturally from the model’s construction. It contains the specifications of IB as a directed graphical model.

Related Work on the Deep Information Bottleneck Model
Information Bottleneck
Gaussian Information Bottleneck
Sparse Gaussian Information Bottleneck
Deep Variational Information Bottleneck
Bounds on Mutual Information in Deep Latent Variable Models
The Difference between Information Bottleneck Models
Motivation
Bound Derivation
Interpretation
The Original IB Assumption Revisited
Information Bottleneck as a Directed Graphical Model
Comparing IB and DVIB Assumptions
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call