Strict Saddle Points Research Articles

Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order discrete methods to a local minimum of nonconvex optimization problems that comprise strict-saddle points within the geometrical landscape. To this end, it focuses on analysis of discrete gradient trajectories around saddle neighborhoods, derives sufficient conditions under which these trajectories can escape strict-saddle neighborhoods in linear time, explores the contractive and expansive dynamics of these trajectories in neighborhoods of strict-saddle points that are characterized by gradients of moderate magnitude, characterizes the non-curving nature of these trajectories, and highlights the inability of these trajectories to re-enter the neighborhoods around strict-saddle points after exiting them. Based on these insights and analyses, the paper then proposes a simple variant of the vanilla gradient descent algorithm, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time. Convergence analysis of the CCRGD algorithm, which includes its rate of convergence to a local minimum, is also presented in the paper. Numerical experiments are then provided on a test function as well as a low-rank matrix factorization problem to evaluate the efficacy of the proposed algorithm.

Read full abstract

AbstractWe study the nonconvex optimization landscape for maximum likelihood estimation in the discrete orbit recovery model with Gaussian noise. This is a statistical model motivated by applications in molecular microscopy and image processing, where each measurement of an unknown object is subject to an independent random rotation from a known rotational group. Equivalently, it is a Gaussian mixture model where the mixture centers belong to a group orbit.We show that fundamental properties of the likelihood landscape depend on the signal‐to‐noise ratio and the group structure. At low noise, this landscape is “benign” for any discrete group, possessing no spurious local optima and only strict saddle points. At high noise, this landscape may develop spurious local optima, depending on the specific group. We discuss several positive and negative examples, and provide a general condition that ensures a globally benign landscape at high noise. For cyclic permutations of coordinates on (multireference alignment), there may be spurious local optima when , and we establish a correspondence between these local optima and those of a surrogate function of the phase variables in the Fourier domain.We show that the Fisher information matrix transitions from resembling that of a single Gaussian distribution in low noise to having a graded eigenvalue structure in high noise, which is determined by the graded algebra of invariant polynomials under the group action. In a local neighborhood of the true object, where the neighborhood size is independent of the signal‐to‐noise ratio, the landscape is strongly convex in a reparametrized system of variables given by a transcendence basis of this polynomial algebra. We discuss implications for optimization algorithms, including slow convergence of expectation‐maximization, and possible advantages of momentum‐based acceleration and variable reparametrization for first‐ and second‐order descent methods. © 2021 Wiley Periodicals LLC.

Read full abstract

Strict Saddle Points Research Articles

Articles published on Strict Saddle Points

Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks

On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization

From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Distributed Synchronous and Asynchronous Algorithms for Semidefinite Programming With Diagonal Constraints

Escaping Strict Saddle Points of the Moreau Envelope in Nonsmooth Optimization

Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks

Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Distributed Learning in Non-Convex Environments— Part II: Polynomial Escape From Saddle-Points

Second-Order Guarantees of Distributed Gradient Algorithms

Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points

Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization

Structured Local Optima in Sparse Blind Deconvolution

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

First-order methods almost always avoid strict saddle points

An Envelope for Davis–Yin Splitting and Strict Saddle-Point Avoidance

Behavior of accelerated gradient methods near critical points of nonconvex functions

Homogenization and non-homogenization of certain non-convex Hamilton–Jacobi equations

Strict Efficiency in Vector Optimization with Nearly Convexlike Set-Valued Maps

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Strict Saddle Points Research Articles

Articles published on Strict Saddle Points

Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks

On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization

From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Distributed Synchronous and Asynchronous Algorithms for Semidefinite Programming With Diagonal Constraints

Escaping Strict Saddle Points of the Moreau Envelope in Nonsmooth Optimization

Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks

Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Distributed Learning in Non-Convex Environments— Part II: Polynomial Escape From Saddle-Points

Second-Order Guarantees of Distributed Gradient Algorithms

Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points

Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization

Structured Local Optima in Sparse Blind Deconvolution

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

First-order methods almost always avoid strict saddle points

An Envelope for Davis–Yin Splitting and Strict Saddle-Point Avoidance

Behavior of accelerated gradient methods near critical points of nonconvex functions

Homogenization and non-homogenization of certain non-convex Hamilton–Jacobi equations

Strict Efficiency in Vector Optimization with Nearly Convexlike Set-Valued Maps