The article presents new methods for searching critical points of a function of several variables, including saddle points. Such problems are found in various fields of theoretical and practical science, for example, saddle-point construction lens design, machine and deep learning, problems of convex optimization and nonlinear programming (necessary and sufficient conditions for the solution are formulated using saddle points of the Lagrange function and proved in the Kuhn-Tucker theorem. When training neural networks, it is necessary to repeat the training process on large clusters and check the network's trainability at different loss functions and different network depth. Which means that thousands of new calculations are run, where each time the loss function is optimized on large amounts of data. So any acceleration in the process of finding critical points is a major advantage and saves computing resources. Many modern methods of searching saddle points are based on calculating the Hessian matrix, inverting this matrix, the scalar product of the gradient vector and the current vector, finding the full Lagrangian, etc. However, all these operations are computationally “expensive” and it would make sense to bypass such complex calculations. The idea of modifying the standard gradient methods used in the article is to apply fixed-point search schemes for nonlinear discrete dynamical systems for gradient descent problems. It is assumed that these fixed points correspond to unstable equilibrium positions, and there are large units among the multipliers of each equilibrium position. The averaged predictive control methods are used. Results of numerical modeling and visualization are presented in the form of two tables, which indicate basins of attraction for each critical point in each scheme, and statistical data by the convergence rates.
Read full abstract