Abstract

In this work we study the distributed representations learnt by generative neural network models. In particular, we investigate the properties of redundant and synergistic information that groups of hidden neurons contain about the target variable. To this end, we use an emerging branch of information theory called partial information decomposition (PID) and track the informational properties of the neurons through training. We find two differentiated phases during the training process: a first short phase in which the neurons learn redundant information about the target, and a second phase in which neurons start specialising and each of them learns unique information about the target. We also find that in smaller networks individual neurons learn more specific information about certain features of the input, suggesting that learning pressure can encourage disentangled representations.

Highlights

  • Neural networks are famously known for their excellent performance, yet are infamously known for their thin theoretical grounding

  • We used a stochastic binarised version of MNIST—every time an image was fed as input to the network, the value of each pixel was sampled from a binomial distribution with a probability equal to the normalised intensity of that pixel

  • The gradients were estimated with contrastive divergence [24] and the weights were optimised with vanilla stochastic gradient descent with fixed learning rate (0.01)

Read more

Summary

Introduction

Neural networks are famously known for their excellent performance, yet are infamously known for their thin theoretical grounding. While common deep learning “tricks” that are empirically proven successful tend to be later discovered to have a theoretical justification (e.g., the Bayesian interpretation of dropout [1,2]), deep learning research still operates “in the dark” and is guided almost exclusively by empirical performance. One common topic in learning theory is the study of data representations, and in the case of deep learning it is the hierarchy of such representations that is often hailed as the key to neural networks’ success [3]. A representation can be said to be disentangled if it has factorisable or compositional structure, and has consistent semantics associated to different generating factors of the underlying data generation process.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.