On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

Constantin Christof,Julia Kowalczyk

doi:10.1007/s00365-023-09658-w

Constantin Christof, Julia Kowalczyk

Open Access

https://doi.org/10.1007/s00365-023-09658-w

Copy DOI

Journal: Constructive Approximation	Publication Date: Jun 14, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Technical University of Munich

Abstract

AbstractWe study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite- and infinite-dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., $$L^p$$ L p -best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

Abstract

Talk to us

Similar Papers

More From: Constructive Approximation

Lead the way for us

Similar Papers

RETRACTED: Breast cancer diagnosis using multiple activation deep neural network
K Vijayakumar ... Sudhir Kumar Sharma
Concurrent Engineering | VOL. 29
K Vijayakumar, et. al.K Vijayakumar ... Sudhir Kumar Sharma
25 Jun 2021
Concurrent Engineering | VOL. 29

Neural network approximation: Three hidden layers are enough
Zuowei Shen ... Shijun Zhang
Neural Networks | VOL. 141
Zuowei Shen, et. al.Zuowei Shen ... Shijun Zhang
17 Apr 2021
Neural Networks | VOL. 141

Efficient estimation of neural weights by polynomial approximation
G Ritter
IEEE Transactions on Information Theory | VOL. 45
G RitterG Ritter
01 Jul 1999
IEEE Transactions on Information Theory | VOL. 45

Deep Neural Network For Structured Data - A Case Study Of Mortality Rate Prediction Caused By Air Quality
Dian Maharani ... Hendri Murfi
Journal of Physics: Conference Series | VOL. 1192
Dian Maharani, et. al.Dian Maharani ... Hendri Murfi
01 Mar 2019
Journal of Physics: Conference Series | VOL. 1192

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

Abstract

Talk to us

Similar Papers

More From: Constructive Approximation