The interpolation phase transition in neural networks: Memorization and generalization under lazy training

Andrea Montanari,Yiqiao Zhong

doi:10.1214/22-aos2211

Abstract

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in d dimensions, and N hidden neurons. We assume that both the sample size n and the dimension d are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime Nd≫n. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as Nd≫n and, therefore, the network can exactly interpolate arbitrary labels in the same regime. Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-ℓ2 norm interpolation. We prove that, as soon as Nd≫n, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a “self-induced” term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on logn/logd).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

Abstract

Talk to us

Similar Papers

More From: The Annals of Statistics

Lead the way for us

Journal: The Annals of Statistics	Publication Date: Oct 1, 2022
Citations: 21

Similar Papers

The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
Song Mei ... Andrea Montanari
Communications on Pure and Applied Mathematics | VOL. 75
Song Mei, et. al.Song Mei ... Andrea Montanari
06 Jun 2021
Communications on Pure and Applied Mathematics | VOL. 75

Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration
Song Mei ... Andrea Montanari
Applied and Computational Harmonic Analysis | VOL. 59
Song Mei, et. al.Song Mei ... Andrea Montanari
01 Jul 2022
Applied and Computational Harmonic Analysis | VOL. 59

Determine the optimal Hidden Layers and Neurons in the Generative Adversarial Networks topology for the Intrusion Detection Systems
Ali Lamjid ... Khairul Akram Zainol Ariffin
-
Ali Lamjid, et. al.Ali Lamjid ... Khairul Akram Zainol Ariffin
06 Oct 2022
06 Oct 2022

Results on infinitely wide multi-layer perceptrons
Susumu Tsuchida
-
Susumu TsuchidaSusumu Tsuchida
17 Dec 2020
17 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

Abstract

Talk to us

Similar Papers

More From: The Annals of Statistics