Deep learning, stochastic gradient descent and diffusion maps

Carmina Fjellström,Kaj Nyström

doi:10.1016/j.jcmds.2022.100054

Carmina Fjellström, Kaj Nyström

Open Access

https://doi.org/10.1016/j.jcmds.2022.100054

Copy DOI

Abstract

Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep neural networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to the top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper, we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular, of the landscape traced out by SGD by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discover (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration, we use diffusion maps introduced by R. Coifman and coauthors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational Mathematics and Data Science	Publication Date: Jun 28, 2022
Citations: 9	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Deep learning, stochastic gradient descent and diffusion maps

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Mathematics and Data Science

Lead the way for us

Similar Papers

A mean field view of the landscape of two-layer neural networks
Song Mei ... Phan-Minh Nguyen
Proceedings of the National Academy of Sciences | VOL. 115
Song Mei, et. al.Song Mei ... Phan-Minh Nguyen
27 Jul 2018
Proceedings of the National Academy of Sciences | VOL. 115

Gradient-Based Empirical Risk Minimization Using Local Polynomial Regression
Ali Jadbabaie ... Devavrat Shah
Stochastic Systems | VOL. -
Ali Jadbabaie, et. al.Ali Jadbabaie ... Devavrat Shah
26 Mar 2024
Stochastic Systems | VOL. -

Surface structure feature matching algorithm for cardiac motion estimation
Zhengrui Zhang ... Xuan Yang
BMC Medical Informatics and Decision Making | VOL. 17
Zhengrui Zhang, et. al.Zhengrui Zhang ... Xuan Yang
01 Dec 2017
BMC Medical Informatics and Decision Making | VOL. 17

Statistical inference for model parameters in stochastic gradient descent
Xi Chen ... Jason D Lee
The Annals of Statistics | VOL. 48
Xi Chen, et. al.Xi Chen ... Jason D Lee
01 Feb 2020
The Annals of Statistics | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep learning, stochastic gradient descent and diffusion maps

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Mathematics and Data Science