Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank

Hung-Hsu Chou,Carsten Gieshoff,Johannes Maly,Holger Rauhut

doi:10.1016/j.acha.2023.101595

Abstract

In deep learning, it is common to use more network parameters than training points. In such scenario of over-parameterization, there are usually multiple networks that achieve zero training error so that the training algorithm induces an implicit bias on the computed solution. In practice, (stochastic) gradient descent tends to prefer solutions which generalize well, which provides a possible explanation of the success of deep learning. In this paper we analyze the dynamics of gradient descent in the simplified setting of linear networks and of an estimation problem. Although we are not in an overparameterized scenario, our analysis nevertheless provides insights into the phenomenon of implicit bias. In fact, we derive a rigorous analysis of the dynamics of vanilla gradient descent, and characterize the dynamical convergence of the spectrum. We are able to accurately locate time intervals where the effective rank of the iterates is close to the effective rank of a low-rank projection of the ground-truth matrix. In practice, those intervals can be used as criteria for early stopping if a certain regularity is desired. We also provide empirical evidence for implicit bias in more general scenarios, such as matrix sensing and random initialization. This suggests that deep learning prefers trajectories whose complexity (measured in terms of effective rank) is monotonically increasing, which we believe is a fundamental concept for the theoretical understanding of deep learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied and Computational Harmonic Analysis	Publication Date: Sep 6, 2023
Citations: 3	License type: other-oa

R Discovery Prime

R Discovery Prime

Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Harmonic Analysis

Lead the way for us

Similar Papers

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias Towards Low Rank
Hung-Hsu Chou ... Carsten Gieshoff
SSRN Electronic Journal | VOL. -
Hung-Hsu Chou, et. al.Hung-Hsu Chou ... Carsten Gieshoff
01 Jan 2021
SSRN Electronic Journal | VOL. -

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup* *This article is an updated version of: Goldt S, Advani M S, Saxe A M, Krzakala F and Zdeborova L 2019 Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup Advances in Neural Information Processing Systems pp 6981–91.
Sebastian Goldt ... Florent Krzakala
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2020
Sebastian Goldt, et. al.Sebastian Goldt ... Florent Krzakala
01 Dec 2020
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2020

Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks *
Rodrigo Veiga ... Lenka Zdeborová
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2023
Rodrigo Veiga, et. al.Rodrigo Veiga ... Lenka Zdeborová
01 Nov 2023
Journal of Statistical Mechanics: Theory and Experiment | VOL. 2023

Theoretical issues in deep networks
Tomaso Poggio ... Qianli Liao
Proceedings of the National Academy of Sciences | VOL. 117
Tomaso Poggio, et. al.Tomaso Poggio ... Qianli Liao
09 Jun 2020
Proceedings of the National Academy of Sciences | VOL. 117

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Harmonic Analysis