Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Yuqing Liang,Yang Yang,Jinlan Liu,Dongpo Xu

doi:10.1016/j.neunet.2024.106514

Abstract

Shuffling-type gradient method is a popular machine learning algorithm that solves finite-sum optimization problems by randomly shuffling samples during iterations. In this paper, we explore the convergence properties of shuffling-type gradient method under mild assumptions. Specifically, we employ the bandwidth-based step size strategy that covers both monotonic and non-monotonic step sizes, thereby providing a unified convergence guarantee in terms of step size. Additionally, we replace the lower bound assumption of the objective function with that of the loss function, thereby eliminating the restrictions on the variance and the second-order moment of stochastic gradient that are difficult to verify in practice. For non-convex objectives, we recover the last iteration convergence of shuffling-type gradient algorithm with a less cumbersome proof. Meanwhile, we also establish the convergence rate for the minimum iteration of gradient norms. Under the Polyak-Łojasiewicz (PL) condition, we prove that the function value of last iteration converges to the lower bound of the objective function. By selecting appropriate boundary functions, we further improve the previous sublinear convergence rate results. Overall, this paper contributes to the understanding of shuffling-type gradient method and its convergence properties, providing insights for optimizing finite-sum problems in machine learning. Finally, numerical experiments demonstrate the efficiency of shuffling-type gradient method with bandwidth-based step size and validate our theoretical results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
Zhaosong Lu ... Lin Xiao
SIAM Journal on Numerical Analysis | VOL. 55
Zhaosong Lu, et. al.Zhaosong Lu ... Lin Xiao
01 Jan 2017
SIAM Journal on Numerical Analysis | VOL. 55

SGD-[formula omitted]: A real-time [formula omitted]-suffix averaging method for SGD with biased gradient estimates
Jianqi Luo ... Huisheng Zhang
Neurocomputing | VOL. 487
Jianqi Luo, et. al.Jianqi Luo ... Huisheng Zhang
26 Feb 2022
Neurocomputing | VOL. 487

Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
Ruijuan Chen ... Xiaoquan Tang
Fractal and Fractional | VOL. 6
Ruijuan Chen, et. al.Ruijuan Chen ... Xiaoquan Tang
29 Nov 2022
Fractal and Fractional | VOL. 6

Novel DCA based algorithms for a special class of nonconvex problems with application in machine learning
Hoai An Le Thi ... Bach Tran
Applied Mathematics and Computation | VOL. 409
Hoai An Le Thi, et. al.Hoai An Le Thi ... Bach Tran
28 Dec 2020
Applied Mathematics and Computation | VOL. 409

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Abstract

Talk to us

Similar Papers

More From: Neural Networks