Abstract

Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.

Highlights

  • In the age of the fifth generation (5G), artificial intelligence (AI) technologies are widening their application fields [1,2,3]

  • In order to deal with this problem and to provide a stable and practically plausible algorithm in the application of the 5G network environment, this paper proposes a modified version of adaptive natural gradient in order to exploit it for mini-batch learning mode

  • Since the natural gradient is derived from stochastic neural network models, let us start from the brief description of the two popular stochastic models

Read more

Summary

Introduction

In the age of the fifth generation (5G), artificial intelligence (AI) technologies are widening their application fields [1,2,3]. Since a single update of parameters is done for a single subset of data, the mini-batch learning mode can still retain stochastic uncertainty of online learning while gaining computational efficiency through matrix calculation for a set of data as in batch mode. For this practical learning strategy, we propose a method for iterative estimation of natural gradient, which is more stable and efficient than the conventional online adaptive natural gradient estimation.

Stochastic Neural Networks
Gradient Descent Learning
Mini-Batch Learning Mode
Natural Gradient for Online Learning Mode
Adaptive Online Natural Gradient Learning
Adaptive Estimation of Fisher Information Matrix in Mini-Batch Mode
Adaptive Natural Gradient Leargning Algorithm in Mini-Batch Mode
Experimental Settings
Experiment for Additive Gaussian Model
Mackey-Glass
Methods
Experiment
Spiral
Obtained
6.6.Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call