Abstract

We study the dynamics of information processing in the continuum depth limit of deep feed-forward Neural Networks (NN) and find that it can be described in language similar to the Renormalization Group (RG). The association of concepts to patterns by a NN is analogous to the identification of the few variables that characterize the thermodynamic state obtained by the RG from microstates. To see this, we encode the information about the weights of a NN in a Maxent family of distributions. The location hyper-parameters represent the weights estimates. Bayesian learning of a new example determine new constraints on the generators of the family, yielding a new probability distribution which can be seen as an entropic dynamics of learning, yielding a learning dynamics where the hyper-parameters change along the gradient of the evidence. For a feed-forward architecture the evidence can be written recursively from the evidence up to the previous layer convoluted with an aggregation kernel. The continuum limit leads to a diffusion-like PDE analogous to Wilson’s RG but with an aggregation kernel that depends on the weights of the NN, different from those that integrate out ultraviolet degrees of freedom. This can be recast in the language of dynamical programming with an associated Hamilton–Jacobi–Bellman equation for the evidence, where the control is the set of weights of the neural network.

Highlights

  • Neural networks are information processing systems that learn from examples [1].Loosely inspired in biological neural systems, they have been used for several types of problems such as classification, regression, dimensional reduction and clustering [2]

  • In this paper we study Learning by Entropic Dynamics in Neural Networks architectures (EDNNA), which generalizes variational methods that have been used to obtain on-line optimal learning algorithms that saturate Bayesian bounds

  • In this article we point out the relation between the Renormalization Group and information processing in a class of neural networks

Read more

Summary

Introduction

Neural networks are information processing systems that learn from examples [1]. Loosely inspired in biological neural systems, they have been used for several types of problems such as classification, regression, dimensional reduction and clustering [2]. Entropy 2020, 22, 587 depends on a large number of example pairs, on-line accuracy performance remains high. The strategy to determine optimized algorithms for on-line learning has been studied in the past for restricted scenarios and architectures. In this paper we study Learning by Entropic Dynamics in Neural Networks architectures (EDNNA), which generalizes variational methods that have been used to obtain on-line optimal learning algorithms that saturate Bayesian bounds. An approximation to this scheme was found for simple networks with no hidden units using a variational procedure [9]. Studied the neural network with a hidden layer, the challenge remains to study networks with deep architectures, which motivates this study

Outline
Feed-Forward Architectures
The Renormalization Group
Maxent Distributions and Bayesian Learning
Deep Multilayer Perceptron
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call