A deterministic gradient-based approach to avoid saddle points

L M Kreusser,S J Osher,B Wang

doi:10.1017/s0956792522000316

Abstract

AbstractLoss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is $\lfloor (n-1)/2\rfloor$ , and hence it is significantly smaller than that of GD whose dimension is $n-1$ .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A deterministic gradient-based approach to avoid saddle points

Abstract

Talk to us

Similar Papers

More From: European Journal of Applied Mathematics

Lead the way for us

Similar Papers

A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks.
Pengcheng Jiang ... Ujjwal Maulik
International journal of neural systems | VOL. 34
Pengcheng Jiang, et. al.Pengcheng Jiang ... Ujjwal Maulik
01 Dec 2024
International journal of neural systems | VOL. 34

Scientific Inference with Interpretable Machine Learning: Analyzing Models to Learn About Real-World Phenomena
Timo Freiesleben ... Álvaro Tejero-Cantero
Minds and Machines | VOL. 34
Timo Freiesleben, et. al.Timo Freiesleben ... Álvaro Tejero-Cantero
15 Jul 2024
Minds and Machines | VOL. 34

Detecting APS failures using LSTM-AE and anomaly transformer enhanced with human expert analysis
Mehmet E Mumcuoglu ... Kerem Koprubasi
Engineering Failure Analysis | VOL. 165
Mehmet E Mumcuoglu, et. al.Mehmet E Mumcuoglu ... Kerem Koprubasi
23 Aug 2024
Engineering Failure Analysis | VOL. 165

Optimal Donor Selection for Hematopoietic Cell Transplantation Using Bayesian Machine Learning.
Brent R Logan ... Martin J Maiers
JCO Clinical Cancer Informatics | VOL. 5
Brent R Logan, et. al.Brent R Logan ... Martin J Maiers
01 Dec 2021
JCO Clinical Cancer Informatics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A deterministic gradient-based approach to avoid saddle points

Abstract

Talk to us

Similar Papers

More From: European Journal of Applied Mathematics