Abstract

We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We derive explicit generalization lower bounds for general biased estimators, in the cases of two-layered networks. For linear activation function, the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. In addition, we derive bounds for unbiased estimators, which show that the latter have unacceptable performance for truly nonlinear networks. The analysis uses elements from the theory of large random matrices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call