Dimension independent excess risk by stochastic gradient descent

Xi Chen,Qiang Liu,Xin T Tong

doi:10.1214/22-ejs2055

Abstract

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods and regularization. The empirical success of overparameterized models, which is often known as benign overfitting, motivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the excess risk of stochastic gradient descent (SGD) solutions for both convex and locally non-convex loss functions. We further discuss data and model conditions that lead to a “low effective dimension”. Under these conditions, we show that the excess risk either does not depend on the ambient dimension p or depends on p via a poly-logarithmic factor. We also demonstrate that in several widely used statistical models, the “low effective dimension” arises naturally in overparameterized settings. The studied statistical applications include both convex models such as linear regression and logistic regression and non-convex models such as M-estimator and two-layer neural networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dimension independent excess risk by stochastic gradient descent

Abstract

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2022
License type: cc-by

Similar Papers

Solving High-Dimensional Multi-Objective Optimization Problems with Low Effective Dimensions
Hong Qian ... Yang Yu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31
Hong Qian, et. al.Hong Qian ... Yang Yu
12 Feb 2017
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

EMT-ReMO: Evolutionary Multitasking for High-Dimensional Multi-Objective Optimization via Random Embedding
Yinglan Feng ... Yaqing Hou
-
Yinglan Feng, et. al.Yinglan Feng ... Yaqing Hou
28 Jun 2021
28 Jun 2021

Stochastic Natural Gradient Descent by estimation of empirical covariances
Luigi Malago ... Giovanni Pistone
-
Luigi Malago, et. al.Luigi Malago ... Giovanni Pistone
01 Jun 2011
01 Jun 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dimension independent excess risk by stochastic gradient descent

Abstract

Talk to us

Similar Papers

More From: Electronic Journal of Statistics