The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

Tianlong Chen,Sijia Liu,Yang Zhang,Jonathan Frankle,Shiyu Chang,Michael Carbin,Zhangyang Wang

doi:10.1109/cvpr46437.2021.01604

Tianlong Chen, Sijia Liu + Show 5 more

Open Access

https://doi.org/10.1109/cvpr46437.2021.01604

Copy DOI

Abstract

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR [10] and MoCo [40]. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that pre-training benefits from gigantic model capacity [11]. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its downstream transferability? In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH) [31]. LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch yet still reach the full models' performance. We extend the scope of LTH and question whether matching subnetworks still exist in pre-trained computer vision models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR, and MoCo, we are consistently able to locate such matching subnetworks at 59.04% to 96.48% sparsity that transfer universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_LTH_Pre-training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jun 1, 2021
Citations: 25	License type: cc-by-nc-sa

Similar Papers

WhisPAr: Transferring pre-trained audio models to fine-grained classification via Prompt and Adapter
Bin Shi ... Meng Zhao
Knowledge-Based Systems | VOL. 300
Bin Shi, et. al.Bin Shi ... Meng Zhao
09 Jul 2024
Knowledge-Based Systems | VOL. 300

Progress in protein pre-training models integrating structural knowledge
Tian-Yi Tang ... Wen-Fei Li
Acta Physica Sinica | VOL. 73
Tian-Yi Tang, et. al.Tian-Yi Tang ... Wen-Fei Li
01 Jan 2024
Acta Physica Sinica | VOL. 73

Self-Supervised Pretraining for Deep Hash-Based Image Retrieval
Haeyoon Yang ... Nam Ik Cho
-
Haeyoon Yang, et. al.Haeyoon Yang ... Nam Ik Cho
16 Oct 2022
16 Oct 2022

Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification
Tao Liang ... Fengmao Lv
-
Tao Liang, et. al.Tao Liang ... Fengmao Lv
01 Jun 2022
01 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

Abstract

Talk to us

Similar Papers