Adaptive proximal SGD based on new estimating sequences for sparser ERM

Zhuan Zhang,Shuisheng Zhou

doi:10.1016/j.ins.2023.118965

Abstract

Estimating sequences introduced by Nesterov is an efficient trick to accelerate gradient descent (GD). The stochastic version of estimating sequences is also successfully used to speed up stochastic gradient descent (SGD). In solving the non-smooth convex optimization problems, the convergence rate of SGD with stochastic estimating sequences is O(1/k). Here k is the number of iterations. In this paper, we present a new way of constructing estimating sequences. The characteristic of the new estimating sequences is to replace the subgradient with the proximal stochastic gradient. The novelty of the new estimating sequences is to replace the fixed learning rate with the adaptive learning rate. The adaptive learning rate is calculated by the exponential moving average of past squared stochastic gradients. Based on the new estimating sequences, we propose an adaptive proximal SGD algorithm, called ES-APSGD, for solving the large-scale ℓ1-norm regularized empirical risk minimization (ERM). The proposed ES-APSGD simplifies the calculation and can obtain a convergence rate of O(1/k2). The significant advantage of ES-APSGD is to strengthen the sparsity of solution by adaptively adjusting the threshold magnitude. Experimental results on Lasso and ℓ1-norm regularized logistic regression show that ES-APSGD speeds up convergence and obtains the sparser optimal solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive proximal SGD based on new estimating sequences for sparser ERM

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Apr 19, 2023
Citations: 5

Similar Papers

Sparse-View Image Reconstruction in Cone-Beam Computed Tomography with Variance-Reduced Stochastic Gradient Descent and Locally-Adaptive Proximal Operation
Davood Karimi ... Rabab K Ward
Journal of Medical and Biological Engineering | VOL. 37
Davood Karimi, et. al.Davood Karimi ... Rabab K Ward
24 Mar 2017
Journal of Medical and Biological Engineering | VOL. 37

The Improved Stochastic Fractional Order Gradient Descent Algorithm
Yang Yang ... Fei Long
Fractal and Fractional | VOL. 7
Yang Yang, et. al.Yang Yang ... Fei Long
18 Aug 2023
Fractal and Fractional | VOL. 7

A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration.
Li Shen ... Wei Liu
IEEE transactions on neural networks and learning systems | VOL. 35
Li Shen, et. al.Li Shen ... Wei Liu
01 Oct 2024
IEEE transactions on neural networks and learning systems | VOL. 35

Author response: Neural learning rules for generating flexible predictions and computing the successor representation
Ching Fang ... Dmitriy Aronov
-
Ching Fang, et. al.Ching Fang ... Dmitriy Aronov
12 Oct 2022
12 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive proximal SGD based on new estimating sequences for sparser ERM

Abstract

Talk to us

Similar Papers

More From: Information Sciences