Abstract

Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ 2 norm ("ridgeless") interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors are obtained by applying a linear transform to a vector of i.i.d. entries, x i = Σ1/2 z i (with ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = φ(Wz i ) (with , a matrix of i.i.d. entries, and φ an activation function acting componentwise on Wz i ). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call