An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G–L Fractional-Order Derivative

Shuang Chen,Changlun Zhang,Haibing Mu

doi:10.1007/s11063-024-11571-7

Abstract

AbstractDeep learning model is a multi-layered network structure, and the network parameters that evaluate the final performance of the model must be trained by a deep learning optimizer. In comparison to the mainstream optimizers that utilize integer-order derivatives reflecting only local information, fractional-order derivatives optimizers, which can capture global information, are gradually gaining attention. However, relying solely on the long-term estimated gradients computed from fractional-order derivatives while disregarding the influence of recent gradients on the optimization process can sometimes lead to issues such as local optima and slower optimization speeds. In this paper, we design an adaptive learning rate optimizer called AdaGL based on the Grünwald–Letnikov (G–L) fractional-order derivative. It changes the direction and step size of parameter updating dynamically according to the long-term and short-term gradients information, addressing the problem of falling into local minima or saddle points. To be specific, by utilizing the global memory of fractional-order calculus, we replace the gradient of parameter update with G–L fractional-order approximated gradient, making better use of the long-term curvature information in the past. Furthermore, considering that the recent gradient information often impacts the optimization phase significantly, we propose a step size control coefficient to adjust the learning rate in real-time. To compare the performance of the proposed AdaGL with the current advanced optimizers, we conduct several different deep learning tasks, including image classification on CNNs, node classification and graph classification on GNNs, image generation on GANs, and language modeling on LSTM. Extensive experimental results demonstrate that AdaGL achieves stable and fast convergence, excellent accuracy, and good generalization performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G–L Fractional-Order Derivative

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters

Lead the way for us

Journal: Neural Processing Letters	Publication Date: Mar 15, 2024
License type: CC BY 4.0

Similar Papers

A Novel Fault Diagnosis Method Under Dynamic Working Conditions Based on a CNN With an Adaptive Learning Rate
Xiaodong Zhai ... Yumin Ma
IEEE Transactions on Instrumentation and Measurement | VOL. 71
Xiaodong Zhai, et. al.Xiaodong Zhai ... Yumin Ma
01 Jan 2021
IEEE Transactions on Instrumentation and Measurement | VOL. 71

Author response: Neural learning rules for generating flexible predictions and computing the successor representation
Ching Fang ... Dmitriy Aronov
-
Ching Fang, et. al.Ching Fang ... Dmitriy Aronov
12 Oct 2022
12 Oct 2022

Editor's evaluation: Neural learning rules for generating flexible predictions and computing the successor representation
Srdjan Ostojic
-
Srdjan OstojicSrdjan Ostojic
29 Aug 2022
29 Aug 2022

Decision letter: Neural learning rules for generating flexible predictions and computing the successor representation
Arthur Juliani ... Timothy E Behrens
-
Arthur Juliani, et. al.Arthur Juliani ... Timothy E Behrens
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G–L Fractional-Order Derivative

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters