Abstract
This paper considers the analysis of continuous time gradient-based optimization algorithms through the lens of nonlinear contraction theory. It demonstrates that in the case of a time-invariant objective, most elementary results on gradient descent based on convexity can be replaced by much more general results based on contraction. In particular, gradient descent converges to a unique equilibrium if its dynamics are contracting in any metric, with convexity of the cost corresponding to the special case of contraction in the identity metric. More broadly, contraction analysis provides new insights for the case of geodesically-convex optimization, wherein non-convex problems in Euclidean space can be transformed to convex ones posed over a Riemannian manifold. In this case, natural gradient descent converges to a unique equilibrium if it is contracting in any metric, with geodesic convexity of the cost corresponding to contraction in the natural metric. New results using semi-contraction provide additional insights into the topology of the set of optimizers in the case when multiple optima exist. Furthermore, they show how semi-contraction may be combined with specific additional information to reach broad conclusions about a dynamical system. The contraction perspective also easily extends to time-varying optimization settings and allows one to recursively build large optimization structures out of simpler elements. Extensions to natural primal-dual optimization and game-theoretic contexts further illustrate the potential reach of these new perspectives.
Highlights
This paper considers the analysis of continuous-time gradient-based optimization through the lens of nonlinear contraction theory
In special cases of a dense Hessian metric M(x) = r2 ψ(x) from a potential ψ(x), note that continuous mirror descent provides an alternate method to compute continuous natural gradient. These methods can avoid the need to invert the metric in cases where there is an explicit inverse exists for the change of variables z = rψ(x), or when (15) can be run at a fast time scale to invert the gradient map through dynamics
This paper has demonstrated that nonlinear contraction analysis provides a general perspective for analyzing and certifying the global convergence properties of gradient-based
Summary
This paper considers the analysis of continuous-time gradient-based optimization through the lens of nonlinear contraction theory. Geodesic convexity [12, 13] generalizes convexity to a Riemannian setting, with applicability to optimization on manifolds [14], as well as to conventional Euclidean settings where Rn is endowed with a manifold structure through the definition of a metric We consider another class of conditions for the convergence of gradient and natural gradient descent to a globally optimal point. We consider the extensions of these results to natural gradient descent, where geodesic convexity of a function corresponds to contraction of its natural gradient system in the natural metric In both cases, results highlight the topology of the set of optimizers in the case of semi-contraction, which would have most direct applicability to over-parameterized networks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have