Abstract

Modern machine learning models often exhibit the benign overfitting phenomenon, which has recently been characterized using the double descent curves. In addition to the classical U-shaped learning curve, the learning risk undergoes another descent as we increase the number of parameters beyond a certain threshold. In this article, we examine the conditions under which benign overfitting occurs in the random feature (RF) models, that is, in a two-layer neural network with fixed first layer weights. Adopting a novel view of random features, we show that benign overfitting emerges because of the noise residing in such features. The noise may already exist in the data and propagates to the features, or it may be added by the user to the features directly. Such noise plays an implicit yet crucial regularization role in the phenomenon. In addition, we derive the explicit tradeoff between the number of parameters and the prediction accuracy, and for the first time demonstrate that overparameterized model can achieve the optimal learning rate in the minimax sense. Finally, our results indicate that the learning risk for overparameterized models has multiple, instead of double descent behavior, which is empirically verified in recent works. Supplementary materials for this article are available online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call