Abstract

Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. Recently, Nesterov 's momentum and Katyusha momentum have significantly improved the convergence for stochastic optimization problems. However, the practical gain of acceleration with Nesterov's momentum is mainly a by-product of mini-batching, while acceleration merely with Katyusha momentum in stochastic steps would make the optimization unstable. In this paper, we build a stochastic and doubly accelerated momentum method (SDAMM) which incorporates the Nesterov's momentum and Katyusha momentum in the framework of variance reduction, to stabilize the accelerated algorithm and reduce the dependence on the mini-batching. Theoretically, SDAMM achieves the best-known convergence rates for convex objectives. The experimental results demonstrate that our SDAMM is competitive with state-of-the-art methods for the optimization problems in machine learning.

Highlights

  • In this paper, we consider the following composite convex optimization problem associated with regularized empirical risk minimization (ERM), which is pervasive in machine learning [1]

  • The acceleration in stochastic and doubly accelerated momentum method (SDAMM) includes the Nesterov’s momentum acceleration in the outer epoch and Katyusha momentum acceleration in the inner iteration, while the practical importance sampling technique is employed for further acceleration

  • We proved that our SDAMM algorithm achieves the best-known optimal convergence rate of O(1/T 2) and low computational complexity of O(n√1/ + nL / )

Read more

Summary

Introduction

We consider the following composite convex optimization problem associated with regularized empirical risk minimization (ERM), which is pervasive in machine learning [1]. Each term fi(x) measures the fitness between x and data sample indexed by i, and the function r(x) acts as the regularization of x to avoid the over-fitting of data. We consider this smooth optimization problem under large-scale setting, and try to seek an approximate minimizer x ∈ Rd , such that F(x) − F(x∗) ≤ , where x∗ is the exact minimizer of F(x).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call