Abstract
Estimating sequences introduced by Nesterov is an efficient trick to accelerate gradient descent (GD). The stochastic version of estimating sequences is also successfully used to speed up stochastic gradient descent (SGD). In solving the non-smooth convex optimization problems, the convergence rate of SGD with stochastic estimating sequences is O(1/k). Here k is the number of iterations. In this paper, we present a new way of constructing estimating sequences. The characteristic of the new estimating sequences is to replace the subgradient with the proximal stochastic gradient. The novelty of the new estimating sequences is to replace the fixed learning rate with the adaptive learning rate. The adaptive learning rate is calculated by the exponential moving average of past squared stochastic gradients. Based on the new estimating sequences, we propose an adaptive proximal SGD algorithm, called ES-APSGD, for solving the large-scale ℓ1-norm regularized empirical risk minimization (ERM). The proposed ES-APSGD simplifies the calculation and can obtain a convergence rate of O(1/k2). The significant advantage of ES-APSGD is to strengthen the sparsity of solution by adaptively adjusting the threshold magnitude. Experimental results on Lasso and ℓ1-norm regularized logistic regression show that ES-APSGD speeds up convergence and obtains the sparser optimal solutions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.