A new perspective on optimizers: leveraging moreau-yosida approximation in gradient-based learning

Alessandro Betti,Frédéric Precioso,Kevin Mottin,Stefano Melacci,Gabriele Ciravegna,Marco Gori

doi:10.3233/ia-240047

Abstract

Machine Learning (ML) heavily relies on optimization techniques built upon gradient descent. Numerous gradient-based update methods have been proposed in the scientific literature, particularly in the context of neural networks, and have gained widespread adoption as optimizers in ML software libraries. This paper introduces a novel perspective by framing gradient-based update strategies using the Moreau-Yosida (MY) approximation of the loss function. Leveraging a first-order Taylor expansion, we demonstrate the concrete exploitability of the MY approximation to generalize the model update process. This enables the evaluation and comparison of regularization properties underlying popular optimizers like gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens up possibilities for designing new update schemes with customizable regularization properties. To illustrate this potential, we propose a case study that redefines the concept of closeness in the parameter space using network outputs. We present a proof-of-concept experimental procedure, demonstrating the effectiveness of this approach in continual learning scenarios. Specifically, we employ the well-known permuted MNIST dataset, a progressively-permuted MNIST and CIFAR-10 benchmarks, and a non i.i.d. stream. Additionally, we validate the update scheme’s efficacy in an offline-learning scenario. By embracing the MY-based unifying view, we pave the way for advancements in optimization techniques for machine learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new perspective on optimizers: leveraging moreau-yosida approximation in gradient-based learning

Abstract

Talk to us

Similar Papers

More From: Intelligenza Artificiale

Lead the way for us

Similar Papers

Moreau–Yosida approximation and convergence of Hamiltonian systems on Wasserstein space
Hwa Kil Kim
Journal of Differential Equations | VOL. 254
Hwa Kil KimHwa Kil Kim
23 Jan 2013
Journal of Differential Equations | VOL. 254

L∞ fitting for inverse problems with uniform noise
Christian Clason
Inverse Problems | VOL. 28
Christian ClasonChristian Clason
01 Oct 2012
Inverse Problems | VOL. 28

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

Evolution Differential Inclusion with Projection for Solving Constrained Nonsmooth Convex Optimization in Hilbert Space
Wei Bian ... Xiaoping Xue
Set-Valued and Variational Analysis | VOL. 20
Wei Bian, et. al.Wei Bian ... Xiaoping Xue
30 Sep 2011
Set-Valued and Variational Analysis | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new perspective on optimizers: leveraging moreau-yosida approximation in gradient-based learning

Abstract

Talk to us

Similar Papers

More From: Intelligenza Artificiale