AbstractIt is well known that a model can generalize even when it completely interpolates the training data, which is known as the benign overfitting. Indeed, several work have theoretically revealed that the minimum-norm interpolator can exhibit the benign overfitting. On the other hand, deep learning models such as two-layer neural networks have been reported to outperform “shallow” learning models such as kernel methods under appropriate model sizes by adaptively learning the basis functions to the data. This mechanism is called feature learning, and it is known empirically to be beneficial even when the model size is large. However, it is generally difficult to show that benign overfitting occurs in learning models with feature learning especially for regression problems. In this study, we then analyze the predictive error of the estimator after one step feature learning in a two-layer linear neural network optimized by gradient descent methods and study the effect of feature learning on benign overfitting. The results show that feature learning reduces bias compared to a one-layer linear regression model without feature learning, especially when the eigenvalues of the covariance of input decay slowly. On the other hand, we clarify that the variance is hardly changed by feature learning. This differs significantly from the results for benign overfitting in the situation without feature learning and indicates the usefulness of feature learning.
Read full abstract