Abstract
This paper provides a detailed study on the convergence properties of the cubic regularized symmetric rank-1 (SR1) method (CuREG-SR1) proposed in [2]. The main advantage of incorporating cubic regularization technique to SR1 is to alleviate the problem of indefinite resulting matrix in SR1. However, its convergence under the line search framework is less studied. Here, we first show that CuREG-SR1 converges to a first-order critical point. Moreover, we give a novel result that provided the uniformly independent assumption, the difference between approximated Hessian matrix generated by CuREG-SR1 and the true Hessian is bounded. In addition, we show that for a problem with dimension d, CuREG-SR1 generates q-d superlinear steps every q iterations. We also propose a novel incremental CuREG-SR1 (ICuREG-SR1) algorithm to adapt SR1 and CuREG-SR1 efficiently for solving large scale problems. The basic idea is to incorporate incremental optimization scheme, which updates progressively information for objective function involving a sum of individual functions, which are frequently encountered in large-scale machine learning. Numerical experiments on several machine learning problems show that the proposed algorithm offers superior performance in terms of the gradient magnitude than other conventional algorithms tested.
Highlights
Many practical engineering problems can be formulated as the solution of unconstrained or constrained optimization problems, e.g., computational biology [18], [19], wireless communications [16], [17] and machine learning [35], [41], etc
In this paper, we mainly focus on two problems, the first is the convergence analysis of CuREG-symmetric rank-1 (SR1) algorithm, and the second is to develop efficient algorithm based on SR1 to solve large scale problems
Based on the line search framework and the motivation of CuREG-SR1 [2], we show that CuREG-SR1 converges to a first-order critical point
Summary
Many practical engineering problems can be formulated as the solution of unconstrained or constrained optimization problems, e.g., computational biology [18], [19], wireless communications [16], [17] and machine learning [35], [41], etc. We are concerned with quasi-Newton methods for the unconstrained optimization problems: min f (x), (1). X∈Rn where f (x) is the objective function and x ∈ Rn is the optimizing variable with dimension n. A variety of quasi-Newton methods solves the problem (1). By using a quadratic model of (1). At each iteration k, the objective function f (x) is approximated at. Current iterate xk by applying the second-order Taylor series with an approximation of the Hessian matrix in lieu of true Hessian matrix. A classical form of the quasi-Newton method is as follows: xk+1 = xk − λk B−k 1∇f (xk ), (2). To obtain a descent direction of the objective funtion in terms of the approximated quadratic model, it requires to solve the corresponding linear system at each iteration
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.