Abstract

Adaptive natural gradient learning avoids singularities in the parameter space of multilayer perceptrons. However, it requires a larger number of additional parameters than ordinary backpropagation in the form of the Fisher information matrix. This paper describes a new approach to natural gradient learning that uses a smaller Fisher information matrix. It also uses a prior distribution on the neural network parameters and an annealed learning rate. While this new approach is computationally simpler, its performance is comparable to that of adaptive natural gradient learning.

Highlights

  • Amari et al developed the adaptive natural gradient learning (ANGL) algorithm for multilayer perceptrons [1,2,3]

  • The simplified natural gradient learning (SNGL) algorithm introduced in this paper uses a new formulation of the Fisher information matrix

  • For ANGL it is the learning rate multiplied by the smallest eigenvalue of the estimated Fisher information matrix

Read more

Summary

Introduction

Amari et al developed the adaptive natural gradient learning (ANGL) algorithm for multilayer perceptrons [1,2,3]. This section describes an exponential family in which each member of the family has a probability density function of network output errors This exponential family will be used to determine the direction of steepest descent and the natural gradient to be used in a learning algorithm. If the inner product of this tangent space is defined to be v, w = Ep{vw}, where Ep is the expectation operator with density function p, the Riemannian metric is the Fisher information matrix because the Fisher information between two score functions is gij = Ep{sisj} [9, 10, 12] These definitions are illustrated in the following example. Amari et al developed an algorithm called adaptive natural gradient learning (ANGL) [2] It uses the matrix inversion lemma [12] to perform a rank 1 update on the inverse of the Fisher information matrix at each step. The ANGL algorithm performs well, but it needs sufficient memory to store a large matrix and is sensitive to initial conditions and learning rates

A Simplified Natural Gradient Learning Algorithm
Experimental Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call