Lottery Jackpots Exist in Pre-Trained Models.

Yuxin Zhang,Fei Chao,Yunshan Zhong,Rongrong Ji,Mingbao Lin

doi:10.1109/tpami.2023.3311783

Abstract

Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Our presented lottery jackpots are traceable through empirical and theoretical outcomes. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. Firstly, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3× cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Secondly, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss, which leads to a faster convergence and reduced oscillation for searching lottery jackpots. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching epochs on ImageNet. Our code is available at https://github.com/zyxxmu/lottery-jackpots.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lottery Jackpots Exist in Pre-Trained Models.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Dec 1, 2023
Citations: 3

Similar Papers

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
Tianlong Chen ... Sijia Liu
-
Tianlong Chen, et. al.Tianlong Chen ... Sijia Liu
01 Jun 2021
01 Jun 2021

Filter pruning via separation of sparsity search and model training
Youzao Lian ... Weisheng Xu
Neurocomputing | VOL. 462
Youzao Lian, et. al.Youzao Lian ... Weisheng Xu
02 Aug 2021
Neurocomputing | VOL. 462

A PRE-TRAINED MODEL BERT FOR MACHINE TRANSLATION FROM ENGLISH TO TELUGU
-
International Journal For Innovative Engineering and Management Research | VOL. -
--
22 May 2022
International Journal For Innovative Engineering and Management Research | VOL. -

Melanoma Classification from Dermoscopy Images Using Ensemble of Convolutional Neural Networks
Rehan Raza ... Gull Bano Anwar
Mathematics | VOL. 10
Rehan Raza, et. al.Rehan Raza ... Gull Bano Anwar
22 Dec 2021
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lottery Jackpots Exist in Pre-Trained Models.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence