Smooth Convex Function Research Articles

The incremental gradient method is a prominent algorithm for minimizing a finite sum of smooth convex functions, used in many contexts including large-scale data processing applications and distributed optimization over networks. It is a first-order method that processes the functions one at a time based on their gradient information. The incremental Newton method, on the other hand, is a second-order variant which exploits additionally the curvature information of the underlying functions and can therefore be faster. In this paper, we focus on the case when the objective function is strongly convex and present fast convergence results for the incremental gradient and incremental Newton methods under the constant and diminishing stepsizes. For a decaying stepsize rule $\alpha_k = \Theta(1/k^s)$ with $s \in (0,1]$, we show that the distance of the IG iterates to the optimal solution converges at rate ${\cal O}(1/k^{s})$ (which translates into ${\cal O}(1/k^{2s})$ rate in the suboptimality of the objective value). For $s>1/2$, this improves the previous ${\cal O}(1/\sqrt{k})$ results in distances obtained for the case when functions are non-smooth. We show that to achieve the fastest ${\cal O}(1/k)$ rate, incremental gradient needs a stepsize that requires tuning to the strong convexity parameter whereas the incremental Newton method does not. The results are based on viewing the incremental gradient method as a gradient descent method with gradient errors, devising efficient upper bounds for the gradient error to derive inequalities that relate distances of the consecutive iterates to the optimal solution and finally applying Chung's lemmas from the stochastic approximation literature to these inequalities to determine their asymptotic behavior. In addition, we construct examples to show tightness of our rate results.

Read full abstract

We propose dynamic sampled stochastic approximation (SA) methods for stochastic optimization with a heavy-tailed distribution (with finite 2nd moment). The objective is the sum of a smooth convex function with a convex regularizer. Typically, it is assumed an oracle with an upper bound $$\sigma ^2$$ on its variance (OUBV). Differently, we assume an oracle with multiplicative noise. This rarely addressed setup is more aggressive but realistic, where the variance may not be uniformly bounded. Our methods achieve optimal iteration complexity and (near) optimal oracle complexity. For the smooth convex class, we use an accelerated SA method a la FISTA which achieves, given tolerance $$\varepsilon >0$$ , the optimal iteration complexity of $$\mathscr {O}(\varepsilon ^{-\frac{1}{2}})$$ with a near-optimal oracle complexity of $$\mathscr {O}(\varepsilon ^{-2})[\ln (\varepsilon ^{-\frac{1}{2}})]^2$$ . This improves upon Ghadimi and Lan (Math Program 156:59–99, 2016) where it is assumed an OUBV. For the strongly convex class, our method achieves optimal iteration complexity of $$\mathscr {O}(\ln (\varepsilon ^{-1}))$$ and optimal oracle complexity of $$\mathscr {O}(\varepsilon ^{-1})$$ . This improves upon Byrd et al. (Math Program 134:127–155, 2012) where it is assumed an OUBV. In terms of variance, our bounds are local: they depend on variances $$\sigma (x^*)^2$$ at solutions $$x^*$$ and the per unit distance multiplicative variance $$\sigma ^2_L$$ . For the smooth convex class, there exist policies such that our bounds resemble, up to absolute constants, those obtained in the mentioned papers if it was assumed an OUBV with $$\sigma ^2:=\sigma (x^*)^2$$ . For the strongly convex class such property is obtained exactly if the condition number is estimated or in the limit for better conditioned problems or for larger initial batch sizes. In any case, if it is assumed an OUBV, our bounds are thus sharper since typically $$\max \{\sigma (x^*)^2,\sigma _L^2\}\ll \sigma ^2$$ .

Read full abstract

Smooth Convex Function Research Articles

Related Topics

Articles published on Smooth Convex Function

Convergence Rate of Incremental Gradient and Incremental Newton Methods

The Proximal Alternating Minimization Algorithm for Two-Block Separable Convex Optimization Problems with Linear Constraints

A New Proximal Iterative Hard Thresholding Method with Extrapolation for $$\ell _0$$ ℓ 0 Minimization

Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization

Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods

On variance reduction for stochastic smooth convex optimization with multiplicative noise

Parallel Stochastic Newton Method

Oracle complexity of second-order methods for smooth convex optimization

Nonsmooth Variants of Powell's BFGS Convergence Theorem

Weak Versus Strong Convergence of a Regularized Newton Dynamic for Maximal Monotone Operators

A proximal difference-of-convex algorithm with extrapolation

Theory and Software Implementations of Shor’s r-Algorithms*

Inexact proximal stochastic gradient method for convex composite optimization

Semi-stochastic coordinate descent

A globally linearly convergent method for pointwise quadratically supportable convex–concave saddle point problems

A unified approach to error bounds for structured convex optimization problems

Image Regularity and Fidelity Measure with a Two-Modality Potential Function

Asymptotic for a second-order evolution equation with convex potential andvanishing damping term

On the Convergence Analysis of the Optimized Gradient Method.

An Extension Theorem for convex functions of class C1,1 on Hilbert spaces

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Smooth Convex Function Research Articles

Related Topics

Articles published on Smooth Convex Function

Convergence Rate of Incremental Gradient and Incremental Newton Methods

The Proximal Alternating Minimization Algorithm for Two-Block Separable Convex Optimization Problems with Linear Constraints

A New Proximal Iterative Hard Thresholding Method with Extrapolation for $$\ell _0$$ ℓ 0 Minimization

Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization

Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods

On variance reduction for stochastic smooth convex optimization with multiplicative noise

Parallel Stochastic Newton Method

Oracle complexity of second-order methods for smooth convex optimization

Nonsmooth Variants of Powell's BFGS Convergence Theorem

Weak Versus Strong Convergence of a Regularized Newton Dynamic for Maximal Monotone Operators

A proximal difference-of-convex algorithm with extrapolation

Theory and Software Implementations of Shor’s r-Algorithms*

Inexact proximal stochastic gradient method for convex composite optimization

Semi-stochastic coordinate descent

A globally linearly convergent method for pointwise quadratically supportable convex–concave saddle point problems

A unified approach to error bounds for structured convex optimization problems

Image Regularity and Fidelity Measure with a Two-Modality Potential Function

Asymptotic for a second-order evolution equation with convex potential andvanishing damping term

On the Convergence Analysis of the Optimized Gradient Method.

An Extension Theorem for convex functions of class C1,1 on Hilbert spaces