Feature learning and generalization error analysis of two-layer linear neural networks for high-dimensional inputs

Hayato Nishimori,Taiji Suzuki

doi:10.1007/s41884-024-00142-3

Abstract

AbstractIt is well known that a model can generalize even when it completely interpolates the training data, which is known as the benign overfitting. Indeed, several work have theoretically revealed that the minimum-norm interpolator can exhibit the benign overfitting. On the other hand, deep learning models such as two-layer neural networks have been reported to outperform “shallow” learning models such as kernel methods under appropriate model sizes by adaptively learning the basis functions to the data. This mechanism is called feature learning, and it is known empirically to be beneficial even when the model size is large. However, it is generally difficult to show that benign overfitting occurs in learning models with feature learning especially for regression problems. In this study, we then analyze the predictive error of the estimator after one step feature learning in a two-layer linear neural network optimized by gradient descent methods and study the effect of feature learning on benign overfitting. The results show that feature learning reduces bias compared to a one-layer linear regression model without feature learning, especially when the eigenvalues of the covariance of input decay slowly. On the other hand, we clarify that the variance is hardly changed by feature learning. This differs significantly from the results for benign overfitting in the situation without feature learning and indicates the usefulness of feature learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Geometry	Publication Date: Jul 28, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Feature learning and generalization error analysis of two-layer linear neural networks for high-dimensional inputs

Abstract

Talk to us

Similar Papers

More From: Information Geometry

Lead the way for us

Similar Papers

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

Mechanism for feature learning in neural networks and backpropagation-free machine learning models.
Adityanarayanan Radhakrishnan ... Mikhail Belkin
Science | VOL. 383
Adityanarayanan Radhakrishnan, et. al.Adityanarayanan Radhakrishnan ... Mikhail Belkin
07 Mar 2024
Science | VOL. 383

Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime *
Naoki Nishikawa ... Denny Wu
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2023
Naoki Nishikawa, et. al.Naoki Nishikawa ... Denny Wu
01 Nov 2023
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2023

On-line learning with adaptive back-propagation in two-layer networks
Ansgar H L West ... David Saad
Physical Review E | VOL. 56
Ansgar H L West, et. al.Ansgar H L West ... David Saad
01 Sep 1997
Physical Review E | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature learning and generalization error analysis of two-layer linear neural networks for high-dimensional inputs

Abstract

Talk to us

Similar Papers

More From: Information Geometry