Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers

Baihan Lin

doi:10.3390/e24010059

Abstract

Inspired by the adaptation phenomenon of neuronal firing, we propose the regularity normalization (RN) as an unsupervised attention mechanism (UAM) which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle. Treating the neural network optimization process as a partially observable model selection problem, the regularity normalization constrains the implicit space by a normalization factor, the universal code length. We compute this universal code incrementally across neural network layers and demonstrate the flexibility to include data priors such as top-down attention and other oracle information. Empirically, our approach outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution in image classification, classic control, procedurally-generated reinforcement learning, generative modeling, handwriting generation and question answering tasks with various neural network architectures. Lastly, the unsupervised attention mechanisms is a useful probing tool for neural networks by tracking the dependency and critical learning stages across layers and recurrent time steps of deep networks.

Highlights

The Minimum Description Length (MDL) principle asserts that the best model given some data is the one that minimizes the combined cost of describing the model and describes the misfit between the model and data [1] with a goal to maximize regularity extraction for optimal data compression, prediction and communication [2]
If we consider the activations from each layer of a neural network as the population codes, the constraint space can be subdivided into the input-vector space, the hidden-vector space, and the implicit space, which represents the underlying dimensions of variability in the other two spaces, i.e., a reduced representation of the constraint space
LN+RN which is a combined approach where the regularity normalization is applied after the layer normalization

Summary

Introduction

The Minimum Description Length (MDL) principle asserts that the best model given some data is the one that minimizes the combined cost of describing the model and describes the misfit between the model and data [1] with a goal to maximize regularity extraction for optimal data compression, prediction and communication [2]. If we consider the neural network training as the optimization process of a communication system, each input at each layer of the system can be described as a point in a low-dimensional continuous constraint space [4]. The minimum code length given any arbitrary θ would be given by L( x |θ ( x )) = − log P( x |θ ( x )) with model θ ( x ) which compresses data sample x most efficiently and offers maximum likelihood P( x |θ (ˆx )) [2]. The compressibility of the model, computed as the minimum code length, can be unattainable for multiple non-i.i.d. data samples as individual inputs, as the probability distributions of most efficiently representing a certain data sample x given a certain model class can vary from sample to sample. The solution relies on the existence of a universal code, P( x ) defined for a model class Θ, such that for any data sample x, the shortest code for x is always L( x |θ ( x )), as shown in [27]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Dec 28, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Neural Networks as Model Selection with Incremental MDL Normalization
Baihan Lin
-
Baihan LinBaihan Lin
01 Jan 2019
01 Jan 2019

Enhanced minimum description length preprocessing of time series trajectories
Gajanan Gawde ... Jyoti Pawar
-
Gajanan Gawde, et. al.Gajanan Gawde ... Jyoti Pawar
01 Mar 2017
01 Mar 2017

A Fourier-based explanation of 1D-CNNs for machine condition monitoring applications
P Borghesani ... W Wang
Mechanical Systems and Signal Processing | VOL. 205
P Borghesani, et. al.P Borghesani ... W Wang
19 Oct 2023
Mechanical Systems and Signal Processing | VOL. 205

An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes
M Goto ... S Hirasawa
IEEE Transactions on Information Theory | VOL. 47
M Goto, et. al.M Goto ... S Hirasawa
01 Mar 2001
IEEE Transactions on Information Theory | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy