Effect of Depth and Width on Local Minima in Deep Learning.

Kenji Kawaguchi,Jiaoyang Huang,Leslie Pack Kaelbling

doi:10.1162/neco_a_01195

Kenji Kawaguchi, Jiaoyang Huang + Show 1 more

Open Access

https://doi.org/10.1162/neco_a_01195

Copy DOI

Journal: Neural computation	Publication Date: May 23, 2019
Citations: 55	License type: cc-by

Affiliation: IIT@MIT, Harvard University

Abstract

In this paper, we analyze the effects of depth and width on the quality of local minima, without strong overparameterization and simplification assumptions in the literature. Without any simplification assumption, for deep nonlinear neural networks with the squared loss, we theoretically show that the quality of local minima tends to improve toward the global minimum value as depth and width increase. Furthermore, with a locally induced structure on deep nonlinear neural networks, the values of local minima of neural networks are theoretically proven to be no worse than the globally optimal values of corresponding classical machine learning models. We empirically support our theoretical observation with a synthetic data set, as well as MNIST, CIFAR-10, and SVHN data sets. When compared to previous studies with strong overparameterization assumptions, the results in this letter do not require overparameterization and instead show the gradual effects of overparameterization as consequences of general results.

Highlights

Deep learning with neural networks has been a significant practical success in many fields, including computer vision, machine learning, and artificial intelligence
We prove quantitative upper bounds on the quality of local minima, which shows that the values of local minima of neural networks are guaranteed to be no worse than the globally optimal values of corresponding classical machine learning models, and the guarantee can improve as depth and width increase
The CIFAR-10 (Krizhevsky & Hinton, 2009) data set consists of 32 × 32 color images that contain different types of objects such as “airplane,” “automobile,” and “cat.” The Street View House Numbers (SVHN) data set (Netzer et al, 2011) contains house digits collected by Google Street View, and we used the 32 × 32 color image version for the standard task of predicting the digits in the middle of these images

Summary

Introduction

Deep learning with neural networks has been a significant practical success in many fields, including computer vision, machine learning, and artificial intelligence. A hope is that beyond the worst-case scenarios, practical deep learning allows some additional structure or assumption to make nonconvex highdimensional optimization tractable It has been shown with strong simplification assumptions that there are novel loss landscape structures in deep learning optimization that may play a role in making the optimization tractable (Dauphin et al, 2014; Choromanska, Henaff, Mathieu, Ben Arous, & LeCun, 2015; Kawaguchi, 2016). Another key observation is that if a neural network is strongly overparameterized so that it can memorize any data set of a fixed size, all stationary points (including all local minima and saddle points) become global minima, with some nondegeneracy assumptions. We prove quantitative upper bounds on the quality of local minima, which shows that the values of local minima of neural networks are guaranteed to be no worse than the globally optimal values of corresponding classical machine learning models, and the guarantee can improve as depth and width increase

Preliminaries

Shallow Nonlinear Neural Networks with Scalar-Valued Output

Deep Nonlinear Neural Networks

Deep Nonlinear Neural Networks with Local Structure

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effect of Depth and Width on Local Minima in Deep Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural computation

Lead the way for us

Similar Papers

StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity
Mohammad Javad Shafiee ... Parthipan Siva
IEEE Access | VOL. 4
Mohammad Javad Shafiee, et. al.Mohammad Javad Shafiee ... Parthipan Siva
01 Jan 2015
IEEE Access | VOL. 4

Evolving deep neural networks using coevolutionary algorithms with multi-population strategy
Sreenivas Sremath Tirumala
Neural Computing and Applications | VOL. 32
Sreenivas Sremath TirumalaSreenivas Sremath Tirumala
01 Feb 2020
Neural Computing and Applications | VOL. 32

Hyper‐parametric improved machine learning models for solar radiation forecasting
Mantosh Kumar ... Neha Kumari
Concurrency and Computation: Practice and Experience | VOL. 34
Mantosh Kumar, et. al.Mantosh Kumar ... Neha Kumari
26 Jul 2022
Concurrency and Computation: Practice and Experience | VOL. 34

Controlled dropout: A different dropout for improving training speed on deep neural network
Byungsoo Ko ... Ho-Jin Choi
-
Byungsoo Ko, et. al.Byungsoo Ko ... Ho-Jin Choi
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effect of Depth and Width on Local Minima in Deep Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural computation