Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Pascal Bianchi,Walid Hachem,Sholom Schechtman

doi:10.1007/s11228-022-00638-z

Abstract

This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; another choice is the output of the celebrated backpropagation algorithm, which is popular amongst practioners, and whose properties have recently been studied by Bolte and Pauwels. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential of the mean function, it has been assumed in the literature that an oracle of the Clarke subdifferential of the mean function is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of a particular differential inclusion: the subgradient flow. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Abstract

Talk to us

Similar Papers

More From: Set-Valued and Variational Analysis

Lead the way for us

Journal: Set-Valued and Variational Analysis	Publication Date: Apr 8, 2022
Citations: 16

Similar Papers

Stochastic approximation algorithms: examples
Vikram Krishnamurthy
-
Vikram KrishnamurthyVikram Krishnamurthy
21 Mar 2016
21 Mar 2016

Author response: Neural learning rules for generating flexible predictions and computing the successor representation
Ching Fang ... Dmitriy Aronov
-
Ching Fang, et. al.Ching Fang ... Dmitriy Aronov
12 Oct 2022
12 Oct 2022

Editor's evaluation: Neural learning rules for generating flexible predictions and computing the successor representation
Srdjan Ostojic
-
Srdjan OstojicSrdjan Ostojic
29 Aug 2022
29 Aug 2022

Decision letter: Neural learning rules for generating flexible predictions and computing the successor representation
Arthur Juliani ... Timothy E Behrens
-
Arthur Juliani, et. al.Arthur Juliani ... Timothy E Behrens
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Abstract

Talk to us

Similar Papers

More From: Set-Valued and Variational Analysis