Compressing Transformers: Features Are Low-Rank, but Weights Are Not!

Hao Yu,Jianxin Wu

doi:10.1609/aaai.v37i9.26304

Abstract

Transformer and its variants achieve excellent results in various computer vision and natural language processing tasks, but high computational costs and reliance on large training datasets restrict their deployment in resource-constrained settings. Low-rank approximation of model weights has been effective in compressing CNN models, but its application to transformers has been less explored and is less effective. Existing methods require the complete dataset to fine-tune compressed models, which are both time-consuming and data-hungry. This paper reveals that the features (i.e., activations) are low-rank, but model weights are surprisingly not low-rank. Hence, AAFM is proposed, which adaptively determines the compressed model structure and locally compresses each linear layer's output features rather than the model weights. A second stage, GFM, optimizes the entire compressed network holistically. Both AAFM and GFM only use few training samples without labels, that is, they are few-shot, unsupervised, fast and effective. For example, with only 2K images without labels, 33% of the parameters are removed in DeiT-B with 18.8% relative throughput increase, but only a 0.23% accuracy loss for ImageNet recognition. The proposed methods are successfully applied to the language modeling task in NLP, too. Besides, the few-shot compressed models generalize well in downstream tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compressing Transformers: Features Are Low-Rank, but Weights Are Not!

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Enhanced gradient learning for deep neural networks
Ming Yan ... Yi Pan
IET Image Processing | VOL. 16
Ming Yan, et. al.Ming Yan ... Yi Pan
09 Nov 2021
IET Image Processing | VOL. 16

Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment
Jie Zhu ... Xiao Han
-
Jie Zhu, et. al.Jie Zhu ... Xiao Han
10 Oct 2022
10 Oct 2022

Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment
...
-
, et. al. ...
11 Aug 2022
11 Aug 2022

A framework-based transformer and knowledge distillation for interior style classification
Anh H Vo ... Bao T Nguyen
Neurocomputing | VOL. 565
Anh H Vo, et. al.Anh H Vo ... Bao T Nguyen
03 Nov 2023
Neurocomputing | VOL. 565

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compressing Transformers: Features Are Low-Rank, but Weights Are Not!

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence