Shaping the learning landscape in neural networks around wide flat minima

Carlo Baldassi,Fabrizio Pittorino,Riccardo Zecchina

doi:10.1073/pnas.1908636117

Abstract

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Dec 23, 2019
Citations: 62	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Shaping the learning landscape in neural networks around wide flat minima

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

Anomalous diffusion dynamics of learning in deep neural networks
Guozhang Chen ... Pulin Gong
Neural networks : the official journal of the International Neural Network Society | VOL. 149
Guozhang Chen, et. al.Guozhang Chen ... Pulin Gong
03 Feb 2022
Neural networks : the official journal of the International Neural Network Society | VOL. 149

Foundation of Deep Machine Learning in Neural Networks
Chih-Cheng Hung ... Yihua Lan
-
Chih-Cheng Hung, et. al.Chih-Cheng Hung ... Yihua Lan
01 Jan 2019
01 Jan 2019

Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis
Andrey Chernoded ... I Volobuev
EPJ web of conferences | VOL. 158
Andrey Chernoded, et. al.Andrey Chernoded ... I Volobuev
01 Jan 2017
EPJ web of conferences | VOL. 158

How Deep Learning and Neural Networks can Improve Prosthetics and Exoskeletons: A Review of State-of-the-Art Methods and Challenges
Triwiyanto Triwiyanto ... Abdussalam Ali Ahmed
Journal of Electronics, Electromedical Engineering, and Medical Informatics | VOL. 5
Triwiyanto Triwiyanto, et. al.Triwiyanto Triwiyanto ... Abdussalam Ali Ahmed
08 Oct 2023
Journal of Electronics, Electromedical Engineering, and Medical Informatics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shaping the learning landscape in neural networks around wide flat minima

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America