Abstract

This paper considers the analysis of continuous time gradient-based optimization algorithms through the lens of nonlinear contraction theory. It demonstrates that in the case of a time-invariant objective, most elementary results on gradient descent based on convexity can be replaced by much more general results based on contraction. In particular, gradient descent converges to a unique equilibrium if its dynamics are contracting in any metric, with convexity of the cost corresponding to the special case of contraction in the identity metric. More broadly, contraction analysis provides new insights for the case of geodesically-convex optimization, wherein non-convex problems in Euclidean space can be transformed to convex ones posed over a Riemannian manifold. In this case, natural gradient descent converges to a unique equilibrium if it is contracting in any metric, with geodesic convexity of the cost corresponding to contraction in the natural metric. New results using semi-contraction provide additional insights into the topology of the set of optimizers in the case when multiple optima exist. Furthermore, they show how semi-contraction may be combined with specific additional information to reach broad conclusions about a dynamical system. The contraction perspective also easily extends to time-varying optimization settings and allows one to recursively build large optimization structures out of simpler elements. Extensions to natural primal-dual optimization and game-theoretic contexts further illustrate the potential reach of these new perspectives.

Highlights

  • This paper considers the analysis of continuous-time gradient-based optimization through the lens of nonlinear contraction theory

  • In special cases of a dense Hessian metric M(x) = r2 ψ(x) from a potential ψ(x), note that continuous mirror descent provides an alternate method to compute continuous natural gradient. These methods can avoid the need to invert the metric in cases where there is an explicit inverse exists for the change of variables z = rψ(x), or when (15) can be run at a fast time scale to invert the gradient map through dynamics

  • This paper has demonstrated that nonlinear contraction analysis provides a general perspective for analyzing and certifying the global convergence properties of gradient-based

Read more

Summary

Introduction

This paper considers the analysis of continuous-time gradient-based optimization through the lens of nonlinear contraction theory. Geodesic convexity [12, 13] generalizes convexity to a Riemannian setting, with applicability to optimization on manifolds [14], as well as to conventional Euclidean settings where Rn is endowed with a manifold structure through the definition of a metric We consider another class of conditions for the convergence of gradient and natural gradient descent to a globally optimal point. We consider the extensions of these results to natural gradient descent, where geodesic convexity of a function corresponds to contraction of its natural gradient system in the natural metric In both cases, results highlight the topology of the set of optimizers in the case of semi-contraction, which would have most direct applicability to over-parameterized networks.

Contraction analysis of gradient systems
Relationships between convexity and contraction
Relationship between geodesic convexity and contraction
Examples
À 3z12 1
Non-autonomous systems and virtual systems
Primal-dual optimization
Primal dual dynamics in natural adaptive control
Natural primal dual
Applying contraction tools to g-convex optimization
Sum of g-convex
Skew-symmetric feedback coupling
Hierarchical natural gradient
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call