On the Existence of Global Minima and Convergence Analyses for Gradient Descent Methods in the Training of Deep Neural Networks

Arnulf Jentzen Null,Adrian Riekert

doi:10.4208/jml.220114a

Abstract

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fr\'echet subdifferentiability). Our mathematical convergence analysis builds up on ideas from our previous article Eberle et al., on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzel\`a-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fr\'echet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Existence of Global Minima and Convergence Analyses for Gradient Descent Methods in the Training of Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: Journal of Machine Learning

Lead the way for us

Similar Papers

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
Simon Eberle ... Adrian Riekert
Electronic Research Archive | VOL. 31
Simon Eberle, et. al.Simon Eberle ... Adrian Riekert
01 Jan 2023
Electronic Research Archive | VOL. 31

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Xin Liu ... Zhisong Pan
Information Sciences | VOL. 612
Xin Liu, et. al.Xin Liu ... Zhisong Pan
05 Sep 2022
Information Sciences | VOL. 612

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Arnulf Jentzen ... Timo Welti
Applied Mathematics and Computation | VOL. 455
Arnulf Jentzen, et. al.Arnulf Jentzen ... Timo Welti
11 May 2023
Applied Mathematics and Computation | VOL. 455

CONVOLUTIONS OF HYPER-ERLANG AND OF ERLANG DISTRIBUTIONS
T Kadri ... K Smaili
International Journal of Pure and Apllied Mathematics | VOL. 98
T Kadri, et. al.T Kadri ... K Smaili
07 Jan 2015
International Journal of Pure and Apllied Mathematics | VOL. 98

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Existence of Global Minima and Convergence Analyses for Gradient Descent Methods in the Training of Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: Journal of Machine Learning