Abstract

In this paper, we analyze the effects of depth and width on the quality of local minima, without strong overparameterization and simplification assumptions in the literature. Without any simplification assumption, for deep nonlinear neural networks with the squared loss, we theoretically show that the quality of local minima tends to improve toward the global minimum value as depth and width increase. Furthermore, with a locally induced structure on deep nonlinear neural networks, the values of local minima of neural networks are theoretically proven to be no worse than the globally optimal values of corresponding classical machine learning models. We empirically support our theoretical observation with a synthetic data set, as well as MNIST, CIFAR-10, and SVHN data sets. When compared to previous studies with strong overparameterization assumptions, the results in this letter do not require overparameterization and instead show the gradual effects of overparameterization as consequences of general results.

Highlights

  • Deep learning with neural networks has been a significant practical success in many fields, including computer vision, machine learning, and artificial intelligence

  • We prove quantitative upper bounds on the quality of local minima, which shows that the values of local minima of neural networks are guaranteed to be no worse than the globally optimal values of corresponding classical machine learning models, and the guarantee can improve as depth and width increase

  • The CIFAR-10 (Krizhevsky & Hinton, 2009) data set consists of 32 × 32 color images that contain different types of objects such as “airplane,” “automobile,” and “cat.” The Street View House Numbers (SVHN) data set (Netzer et al, 2011) contains house digits collected by Google Street View, and we used the 32 × 32 color image version for the standard task of predicting the digits in the middle of these images

Read more

Summary

Introduction

Deep learning with neural networks has been a significant practical success in many fields, including computer vision, machine learning, and artificial intelligence. A hope is that beyond the worst-case scenarios, practical deep learning allows some additional structure or assumption to make nonconvex highdimensional optimization tractable It has been shown with strong simplification assumptions that there are novel loss landscape structures in deep learning optimization that may play a role in making the optimization tractable (Dauphin et al, 2014; Choromanska, Henaff, Mathieu, Ben Arous, & LeCun, 2015; Kawaguchi, 2016). Another key observation is that if a neural network is strongly overparameterized so that it can memorize any data set of a fixed size, all stationary points (including all local minima and saddle points) become global minima, with some nondegeneracy assumptions. We prove quantitative upper bounds on the quality of local minima, which shows that the values of local minima of neural networks are guaranteed to be no worse than the globally optimal values of corresponding classical machine learning models, and the guarantee can improve as depth and width increase

Preliminaries
Shallow Nonlinear Neural Networks with Scalar-Valued Output
Deep Nonlinear Neural Networks
Deep Nonlinear Neural Networks with Local Structure
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.