UCC: A unified cascade compression framework for vision transformer models

Dingfu Chen,Kangwei Lin,Qingxu Deng

doi:10.1016/j.neucom.2024.128747

Abstract

In recent years, Vision Transformer (ViT) and its variants have dominated many computer vision tasks. However, the high computational consumption and training data requirements of ViT make it challenging to be deployed directly on resource-constrained devices and environments. Model compression is an effective approach to accelerate deep learning networks, but existing methods for compressing ViT models are limited in their scopes and struggle to strike a balance between performance and computational cost. In this paper, we propose a novel Unified Cascaded Compression Framework (UCC) to compress ViT in a more precise and efficient manner. Specifically, we first analyze the frequency information within tokens and prune them based on a joint score of their both spatial and spectral characteristics. Subsequently, we propose a similarity-based token aggregation scheme that combines the abundant contextual information contained in all pruned tokens with the host tokens according to their weights. Additionally, we introduce a novel cumulative cascaded pruning strategy that performs bottom-up cascaded pruning of tokens based on cumulative scores, avoiding information loss caused by individual idiosyncrasies of blocks. Finally, we design a novel two-level distillation strategy, incorporating imitation and exploration, to ensure the diversity of knowledge and better performance recovery. Extensive experiments demonstrate that our unified cascaded compression framework outperforms most existing state-of-the-art approaches, compresses the floating-point operations of ViT-Base as well as DeiT-Base models by 22 % and 54.1 %, and improves the recognition accuracy of the models by 3.74 % and 1.89 %, respectively, significantly reducing model computational consumption while enhancing performance, which enables efficient end-to-end training of compact ViT models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

UCC: A unified cascade compression framework for vision transformer models

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Partial Least Squares: A Deep Space Odyssey
Artur Jordão ... William Robson Schwartz
-
Artur Jordão, et. al.Artur Jordão ... William Robson Schwartz
18 Oct 2021
18 Oct 2021

Model distillation for high-level semantic understanding：a survey
Ruoyu Sun ... Hongkai Xiong
Journal of Image and Graphics | VOL. 28
Ruoyu Sun, et. al.Ruoyu Sun ... Hongkai Xiong
01 Jan 2023
Journal of Image and Graphics | VOL. 28

YOLO sparse training and model pruning for street view house numbers recognition
Ruohao Zhang ... Zhengfei Song
Journal of Physics: Conference Series | VOL. 2646
Ruohao Zhang, et. al.Ruohao Zhang ... Zhengfei Song
01 Dec 2023
Journal of Physics: Conference Series | VOL. 2646

A REVIEW: Combustion Detection System
Prof.Supriya Kambale
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Prof.Supriya KambaleProf.Supriya Kambale
01 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UCC: A unified cascade compression framework for vision transformer models

Abstract

Talk to us

Similar Papers

More From: Neurocomputing