Adadb: Adaptive Diff-Batch Optimization Technique for Gradient Descent

Muhammad U S Khan,Muhammad Jawad,Samee U Khan

doi:10.1109/access.2021.3096976

Abstract

Gradient descent is the workhorse of deep neural networks. Gradient descent has the disadvantage of slow convergence. The famous way to overcome slow convergence is to use momentum. Momentum effectively increases the learning factor of gradient descent. Recently, many approaches have been proposed to control the momentum for better optimization towards global minima, such as Adam, diffGrad, and AdaBelief. Adam decreases the momentum by dividing it with square root of moving averages of squared past gradients or second moment. The sudden decrease in the second moment often results in the overshoot of the gradient from the minima and then settle at the closest minima. DiffGrad decreases this problem by using a friction constant based on the difference of current gradient and immediate past gradient in Adam. The friction constant further decreases the momentum and results in slow convergence. AdaBelief adapts the step size according to the belief in the current gradient direction. Another famous way of fast convergence is to increase the batch size adaptively. This paper proposes a new optimization technique named adaptive diff-batch or adadb that removes the problem of overshooting gradient in Adam, slow convergence in diffGrad, and combines the methods with adaptive batch size for further increase in convergence rate. The proposed technique uses the friction constant based on the past three differences of gradients rather than one as in diffGrad and a condition to decide the use of friction constant. The proposed technique has outperformed the Adam, diffGrad, and AdaBelief optimizers on synthetic complex non-convex functions and real-world datasets.

Highlights

In recent times, neural network-based algorithms are gaining popularity due to the availability of big data and large computing power in the form of GPUs
Many attempts have been made to optimize the convergence of gradient descent to get the true benefits of big data and large computing power with neural networks
The most famous way of increasing the convergence rate of gradient descent is the use of momentum

Summary

INTRODUCTION

Neural network-based algorithms are gaining popularity due to the availability of big data and large computing power in the form of GPUs. The most famous way of increasing the convergence rate of gradient descent is the use of momentum. There are two renowned methods of controlling the convergence rate: reduce the momentum and increase the batch size. Adam [3], diffGrad [4], and AdaBelief [5] optimization techniques reduce the momentum whereas adabatch technique [6] increases the batch size for better and fast optimization towards the global minima. Adam and AdaBelief optimizer techniques often overshoot the global minima whereas diffGrad suffers from the slow convergence of the solution. We use both methods, control of convergence rate and increase in batch size. Combining adaptive batch size with convergence technique is based on our previous work that has shown success in improving the convergence rate [7].

BASICS OF GRADIENT DESCENT

PROPOSED METHODOLOGY

CONVERGENCE ANALYSIS

VIII. RESULTS AND DISCUSSION

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Adadb: Adaptive Diff-Batch Optimization Technique for Gradient Descent

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Generalized Newton methods for energy formulations in image procesing
Leah Bar ... Guillermo Sapiro
-
Leah Bar, et. al.Leah Bar ... Guillermo Sapiro
01 Jan 2008
01 Jan 2008

Convolutional Neural Networks (CNNs) for Pneumonia Classification on Pediatric Chest Radiographs.
Yash S Saboo ... Prateek Prasanna
Cureus | VOL. 15
Yash S Saboo, et. al.Yash S Saboo ... Prateek Prasanna
25 Aug 2023
Cureus | VOL. 15

An Improved Adam Optimization Algorithm Combining Adaptive Coefficients and Composite Gradients Based on Randomized Block Coordinate Descent.
Miaomiao Liu ... Jingfeng Guo
Computational Intelligence and Neuroscience | VOL. 2023
Miaomiao Liu, et. al.Miaomiao Liu ... Jingfeng Guo
01 Jan 2023
Computational Intelligence and Neuroscience | VOL. 2023

From big data to smart data: a sample gradient descent approach for machine learning
Aadil Gani Ganie ... Samad Dadvandipour
Journal of Big Data | VOL. 10
Aadil Gani Ganie, et. al.Aadil Gani Ganie ... Samad Dadvandipour
19 Oct 2023
Journal of Big Data | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adadb: Adaptive Diff-Batch Optimization Technique for Gradient Descent

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access