Abstract
In this paper, we present new second-order methods with convergence rate Oleft( k^{-4}right) , where k is the iteration counter. This is faster than the existing lower bound for this type of schemes (Agarwal and Hazan in Proceedings of the 31st conference on learning theory, PMLR, pp. 774–792, 2018; Arjevani and Shiff in Math Program 178(1–2):327–360, 2019), which is Oleft( k^{-7/2} right) . Our progress can be explained by a finer specification of the problem class. The main idea of this approach consists in implementation of the third-order scheme from Nesterov (Math Program 186:157–183, 2021) using the second-order oracle. At each iteration of our method, we solve a nontrivial auxiliary problem by a linearly convergent scheme based on the relative non-degeneracy condition (Bauschke et al. in Math Oper Res 42:330–348, 2016; Lu et al. in SIOPT 28(1):333–354, 2018). During this process, the Hessian of the objective function is computed once, and the gradient is computed Oleft( ln {1 over epsilon }right) times, where epsilon is the desired accuracy of the solution for our problem.
Highlights
In the last years, the theory of high-order methods in convex optimization was developed seemingly up to its natural limits
Yurii.Nesterov@uclouvain.be 1 Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), Louvain-la-Neuve, Belgium Journal of Optimization Theory and Applications (2021) 191:1–30 auxiliary problem in tensor methods can be posed as a problem of minimizing a convex multivariate polynomial [15], very soon the performance of these methods was increased up to the maximal limits [6,7,9], given by the theoretical lower complexity bounds [1,2]
We conclude that the existing classification of the problem classes, optimization schemes, and complexity bounds is not perfect
Summary
Journal of Optimization Theory and Applications (2021) 191:1–30 auxiliary problem in tensor methods can be posed as a problem of minimizing a convex multivariate polynomial [15], very soon the performance of these methods was increased up to the maximal limits [6,7,9], given by the theoretical lower complexity bounds [1,2]. 3, we analyze the rate of convergence of the gradient method based on the relative smoothness condition [4,10], under the assumption that the gradient of the objective function is computed with a small absolute error We need this analysis for replacing the exact value of the third derivative along two vectors by a finite difference of the gradients. We compute the Hessian once and the gradient is computed O ln 1 times, where is the desired accuracy of the solution of the main problem Recall that this rate of convergence is impossible for the second-order schemes working with the functions with Lipschitz-continuous third derivative (see [1,2]). Which is valid for all a, b ≥ 0 and p ≥ 1
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.