Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Utku Evci,Cem Keskin,Yann Dauphin,Yani Ioannou

doi:10.1609/aaai.v36i6.20611

Abstract

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from — however, this comes at the cost of learning novel solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 15

Similar Papers

A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
...
arXiv (Cornell University) | VOL. -
, et. al. ...
17 Mar 2019
arXiv (Cornell University) | VOL. -

A brain-inspired algorithm for training highly sparse neural networks
Zahra Atashgahi ... Decebal Constantin Mocanu
Machine Learning | VOL. 111
Zahra Atashgahi, et. al.Zahra Atashgahi ... Decebal Constantin Mocanu
08 Nov 2022
Machine Learning | VOL. 111

A CPU-based algorithm for traffic optimization based on sparse convolutional neural networks
Zhizhou Li ... Justin Eichel
-
Zhizhou Li, et. al. Zhizhou Li ... Justin Eichel
01 Apr 2017
01 Apr 2017

Topological Insights into Sparse Neural Networks
Shiwei Liu ... Davide Ferraro
-
Shiwei Liu, et. al.Shiwei Liu ... Davide Ferraro
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence