DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Victor Quétu,Enzo Tartaglione

doi:10.1609/aaai.v38i13.29393

Abstract

Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Identification of neonatal hearing impairment: evaluation of transient evoked otoacoustic emission, distortion product otoacoustic emission, and auditory brain stem response test performance.
Susan J Norton ... Judith E Widen
Ear and Hearing | VOL. 21
Susan J Norton, et. al.Susan J Norton ... Judith E Widen
01 Oct 2000
Ear and Hearing | VOL. 21

COMPARING DIFFERENT STOPPING CRITERIA FOR FUZZY DECISION TREE INDUCTION THROUGH IDFID3
...
Iranian Journal of Fuzzy Systems | VOL. 11
, et. al. ...
25 Feb 2014
Iranian Journal of Fuzzy Systems | VOL. 11

An early stopping criterion for decoding LDPC codes in WiMAX and WiFi standards
Zhixiang Chen ... Xiao Peng
-
Zhixiang Chen, et. al.Zhixiang Chen ... Xiao Peng
01 May 2010
01 May 2010

Early Stopping Criterion Combining Probability Density Function with Validation Error for Improving the Generalization Capability of the Backpropagation Neural Network
Wei Wang ... Xue-Feng Yan
DEStech Transactions on Engineering and Technology Research | VOL. -
Wei Wang, et. al.Wei Wang ... Xue-Feng Yan
02 Mar 2018
DEStech Transactions on Engineering and Technology Research | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence