CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Oleksandr Borysenko,Maksym Byshkin

doi:10.1038/s41598-021-90144-3

Abstract

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum—a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.

Highlights

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima
It is shown that several optimization algorithms, e.g stochastic gradient descent (SGD) with m omentum3, Adagrad4, RMSProp5, Adadelta[6] and A dam[7] are efficient for training artificial neural networks and optimization of nonconvex objective f unctions[8,9] In nonconvex setting, the objective function has multiple local minima and the efficient algorithms rely on the “hill climbing” heuristics
We propose to adapt the methods of Langevin dynamics to the problems of nonconvex optimization, that appear in machine learning

Summary

Introduction

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. Training of machine learning models is performed by finding such values of their parameters that optimize an objective function.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: May 21, 2021
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Application of stochastic optimization method for an urban corridor
...
-
, et. al. ...
03 Dec 2006
03 Dec 2006

Application of Stochastic Optimization Method for an Urban Corridor
Ilsoo Yun ... Byungkyu Park
-
Ilsoo Yun, et. al.Ilsoo Yun ... Byungkyu Park
01 Dec 2006
01 Dec 2006

Global and Dynamic Optimization using the Artificial Chemical Process Paradigm and Fast Monte Carlo Methods for the Solution of Population Balance Models
Roberto Irizarry
-
Roberto IrizarryRoberto Irizarry
28 Feb 2011
28 Feb 2011

Estimation of Fault Plane Parameters by Using Stochastic Optimization Methods

-

30 Jun 2014
30 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports