Abstract

In this paper, we present new second-order methods with convergence rate Oleft( k^{-4}right) , where k is the iteration counter. This is faster than the existing lower bound for this type of schemes (Agarwal and Hazan in Proceedings of the 31st conference on learning theory, PMLR, pp. 774–792, 2018; Arjevani and Shiff in Math Program 178(1–2):327–360, 2019), which is Oleft( k^{-7/2} right) . Our progress can be explained by a finer specification of the problem class. The main idea of this approach consists in implementation of the third-order scheme from Nesterov (Math Program 186:157–183, 2021) using the second-order oracle. At each iteration of our method, we solve a nontrivial auxiliary problem by a linearly convergent scheme based on the relative non-degeneracy condition (Bauschke et al. in Math Oper Res 42:330–348, 2016; Lu et al. in SIOPT 28(1):333–354, 2018). During this process, the Hessian of the objective function is computed once, and the gradient is computed Oleft( ln {1 over epsilon }right) times, where epsilon is the desired accuracy of the solution for our problem.

Highlights

  • In the last years, the theory of high-order methods in convex optimization was developed seemingly up to its natural limits

  • Yurii.Nesterov@uclouvain.be 1 Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), Louvain-la-Neuve, Belgium Journal of Optimization Theory and Applications (2021) 191:1–30 auxiliary problem in tensor methods can be posed as a problem of minimizing a convex multivariate polynomial [15], very soon the performance of these methods was increased up to the maximal limits [6,7,9], given by the theoretical lower complexity bounds [1,2]

  • We conclude that the existing classification of the problem classes, optimization schemes, and complexity bounds is not perfect

Read more

Summary

B Yurii Nesterov

Journal of Optimization Theory and Applications (2021) 191:1–30 auxiliary problem in tensor methods can be posed as a problem of minimizing a convex multivariate polynomial [15], very soon the performance of these methods was increased up to the maximal limits [6,7,9], given by the theoretical lower complexity bounds [1,2]. 3, we analyze the rate of convergence of the gradient method based on the relative smoothness condition [4,10], under the assumption that the gradient of the objective function is computed with a small absolute error We need this analysis for replacing the exact value of the third derivative along two vectors by a finite difference of the gradients. We compute the Hessian once and the gradient is computed O ln 1 times, where is the desired accuracy of the solution of the main problem Recall that this rate of convergence is impossible for the second-order schemes working with the functions with Lipschitz-continuous third derivative (see [1,2]). Which is valid for all a, b ≥ 0 and p ≥ 1

Tensor Methods with Inexact Iteration
Relative Non-degeneracy and Approximate Gradients
R2τ2 2
Second-Order Implementations of the Third-Order Methods
Bounds for the Derivatives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call