A mean field view of the landscape of two-layer neural networks

Song Mei,Andrea Montanari,Phan-Minh Nguyen

doi:10.1073/pnas.1806579115

Song Mei, Andrea Montanari + Show 1 more

Open Access

https://doi.org/10.1073/pnas.1806579115

Copy DOI

Abstract

Multilayer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires optimizing a nonconvex high-dimensional objective (risk function), a problem that is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the former case, does this happen because local minima are absent or because SGD somehow avoids them? In the latter, why do local minima reached by SGD have good generalization properties? In this paper, we consider a simple case, namely two-layer neural networks, and prove that-in a suitable scaling limit-SGD dynamics is captured by a certain nonlinear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows for "averaging out" some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: Jul 27, 2018
Citations: 429	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

A mean field view of the landscape of two-layer neural networks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

Machine Learning Models to Predict Production Rate of Sucker Rod Pump Wells
S. Thabet ... T. Yehia
-
S. Thabet, et. al.S. Thabet ... T. Yehia
09 Apr 2024
09 Apr 2024

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup* *This article is an updated version of: Goldt S, Advani M S, Saxe A M, Krzakala F and Zdeborova L 2019 Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup Advances in Neural Information Processing Systems pp 6981–91.
Sebastian Goldt ... Florent Krzakala
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2020
Sebastian Goldt, et. al.Sebastian Goldt ... Florent Krzakala
01 Dec 2020
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2020

Next-Gen Proppant Cleanout Operations: Machine Learning for Bottom-Hole Pressure Prediction
Samuel A Thabet ... Ahmed Helmy
-
Samuel A Thabet, et. al.Samuel A Thabet ... Ahmed Helmy
20 Oct 2024
20 Oct 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A mean field view of the landscape of two-layer neural networks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences