Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments

Tuyen Trung Truong,Hang-Tuan Nguyen

doi:10.1007/s00245-020-09718-8

Tuyen Trung Truong, Hang-Tuan Nguyen

Open Access

https://doi.org/10.1007/s00245-020-09718-8

Copy DOI

Abstract

In this paper, we provide new results and algorithms (including backtracking versions of Nesterov accelerated gradient and Momentum) which are more applicable to large scale optimisation as in Deep Neural Networks. We also demonstrate that Backtracking Gradient Descent (Backtracking GD) can obtain good upper bound estimates for local Lipschitz constants for the gradient, and that the convergence rate of Backtracking GD is similar to that in classical work of Armijo. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify a heuristic argument that Backtracking GD stabilises to a finite union of sequences constructed from Standard GD for the mini-batch practice, and show that our new algorithms (while automatically fine tuning learning rates) perform better than current state-of-the-art methods such as Adam, Adagrad, Adadelta, RMSProp, Momentum and Nesterov accelerated gradient. To help readers avoiding the confusion between heuristics and more rigorously justified algorithms, we also provide a review of the current state of convergence results for gradient descent methods. Accompanying source codes are available on GitHub.

Highlights

We provide a non-technical overview of the important role and current practices of Gradient Descent methods (GD) in optimisation, in particular in large scale optimisation as in Deep Neural Networks (DNN), and some new features of our main results in this paper.One special feature of the modern society is the need of solving large scale optimisation problems quickly, stably, efficiently and reproducibly
Note that as we mentioned in the Introduction—based on results and ideas in Truong and Nguyen [47]—new versions of Backtracking GD are proposed in Truong [44] and shown to be able to avoid saddle points, under assumptions more general than those required by Lee et al [27], Panageas and Piliouras [34]) for Standard
We do experiments with Two-way Backtracking GD for two cost functions: one is the Mexican hat in Example 3.3, and the other is the cost function coming from applying Resnet18 on a random set of 500 samples of CIFAR10

Summary

Introduction

We provide a non-technical overview of the important role and current practices of Gradient Descent methods (GD) in optimisation, in particular in large scale optimisation as in Deep Neural Networks (DNN), and some new features of our main results in this paper. One special feature of the modern society is the need of solving large scale optimisation problems quickly, stably, efficiently and reproducibly One exemplar for this is the development of Deep Learning, which has obtained spectacular achievements recently. Modern state of the art DNN can have millions of parameters With this big size of optimisation problem, the only tool one could rely on are numerical optimisation algorithms, serving to arrive closely to good local minima. As great as it is, there are many serious concerns about the current practices in Deep Learning such as it is very fooled and is still not safe. This is the most general class that current techniques can be used in solving non-convex optimisation problems, and is flexible enough to adapt to many kinds of realistic applications

A Brief Introduction to Gradient Descent Methods

What is the State-of-the-Art for Convergence of GD Methods?

What is New About this Paper?

Overview and Comparison of Previous Results

A General Convergence Result for Backtracking GD

Comparison to Previous Work

Backtracking Versions of MMT and NAG

A Heuristic Argument for the Effectiveness of Standard GD

Two-Way Backtracking GD

Unbounded Backtracking GD

Rescaling of Learning Rates

Mini-batch Backtracking Algorithms

Experimental Results

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Experiment 5

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Mathematics & Optimization	Publication Date: Sep 6, 2020
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Mathematics & Optimization

Lead the way for us

Similar Papers

A Novel Learning Algorithm to Optimize Deep Neural Networks: Evolved Gradient Direction Optimizer (EVGO).
Oguz Akbilgic ... Ibrahim Karabayir
IEEE transactions on neural networks and learning systems | VOL. 32
Oguz Akbilgic, et. al.Oguz Akbilgic ... Ibrahim Karabayir
02 Apr 2020
IEEE transactions on neural networks and learning systems | VOL. 32

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Wei Tao ... Xin Liu
Information Sciences | VOL. 612
Wei Tao, et. al.Wei Tao ... Xin Liu
05 Sep 2022
Information Sciences | VOL. 612

Deep stable neural networks: Large-width asymptotics and convergence rates
Stefano Favaro ... Sandra Fortini
Bernoulli | VOL. 29
Stefano Favaro, et. al.Stefano Favaro ... Sandra Fortini
01 Aug 2023
Bernoulli | VOL. 29

Robust Sparse Regularization: Defending Adversarial Attacks Via Regularized Sparse Network
Liqiang Wang ... Adnan Siraj Rakin
-
Liqiang Wang, et. al.Liqiang Wang ... Adnan Siraj Rakin
07 Sep 2020
07 Sep 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Mathematics & Optimization