Abstract

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set \(s_1, \ldots , s_n \in {\mathbb {R}}^p\) with corresponding responses \(t_1,\ldots ,t_n \in {\mathbb {R}}^q\), fitting a k-layer neural network \(\nu _\theta : {\mathbb {R}}^p \rightarrow {\mathbb {R}}^q\) involves estimation of the weights \(\theta \in {\mathbb {R}}^m\) via an ERM: $$\begin{aligned} \inf _{\theta \in {\mathbb {R}}^m} \ \sum _{i=1}^n \Vert t_i - \nu _\theta (s_i) \Vert _2^2. \end{aligned}$$We show that even for \(k = 2\), this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize such a loss function in the event when its infimum is not attainable, it necessarily results in values of \(\theta \) diverging to \(\pm \infty \). We will show that for smooth activations \(\sigma (x)= 1/\bigl (1 + \exp (-x)\bigr )\) and \(\sigma (x)=\tanh (x)\), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation \(\sigma (x)=\max (0,x)\), we completely classify cases where the ERM for a best two-layer neural network approximation attains its infimum. In recent applications of neural networks, where overfitting is commonplace, the failure to attain an infimum is avoided by ensuring that the system of equations \(t_i = \nu _\theta (s_i)\), \(i =1,\ldots ,n\), has a solution. For a two-layer ReLU-activated network, we will show when such a system of equations has a solution generically, i.e., when can such a neural network be fitted perfectly with probability one.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.