Abstract

By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of deep linear neural network models. After providing clarification on the various definitions of "flat" minima, we show that the geometrically flat minima, which are merely artifacts of residual continuous symmetries of the deep linear networks, can be straightforwardly removed by a generalized L2-regularization. Then, we establish upper bounds on the number of isolated stationary points of these networks with the help of algebraic geometry. Combining these upper bounds with a method in numerical algebraic geometry, we find all stationary points for modest depth and matrix size. We demonstrate that, in the presence of the non-zero regularization, deep linear networks can indeed possess local minima which are not global minima. Finally, we show that even though the number of stationary points increases as the number of neurons (regularization parameters) increases (decreases), higher index saddles are surprisingly rare.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.