Abstract

The gradient descent method minimizes an unconstrained nonlinear optimization problem with \({\mathcal {O}}(1/\sqrt{K})\), where K is the number of iterations performed by the gradient method. Traditionally, this analysis is obtained for smooth objective functions having Lipschitz continuous gradients. This paper aims to consider a more general class of nonlinear programming problems in which functions have Holder continuous gradients. More precisely, for any function f in this class, denoted by \({{\mathcal {C}}}^{1,\nu }_L\), there is a \(\nu \in (0,1]\) and \(L>0\) such that for all \(\mathbf{x,y}\in {{\mathbb {R}}}^n\) the relation \(\Vert \nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert \le L \Vert \mathbf{x}-\mathbf{y}\Vert ^{\nu }\) holds. We prove that the gradient descent method converges globally to a stationary point and exhibits a convergence rate of \({\mathcal {O}}(1/K^{\frac{\nu }{\nu +1}})\) when the step-size is chosen properly, i.e., less than \([\frac{\nu +1}{L}]^{\frac{1}{\nu }}\Vert \nabla f(\mathbf{x}_k)\Vert ^{\frac{1}{\nu }-1}\). Moreover, the algorithm employs \({\mathcal {O}}(1/\epsilon ^{\frac{1}{\nu }+1})\) number of calls to an oracle to find \({\bar{\mathbf{x}}}\) such that \(\Vert \nabla f({{\bar{\mathbf{x}}}})\Vert <\epsilon \).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.