Automatic pruning rate adjustment for dynamic token reduction in vision transformer

Ryuto Ishibashi,Lin Meng

doi:10.1007/s10489-025-06265-z

Ryuto Ishibashi, Lin Meng

Open Access

https://doi.org/10.1007/s10489-025-06265-z

Copy DOI

Export

Save

Cite

Journal: Applied Intelligence	Publication Date: Jan 18, 2025
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Vision Transformer (ViT) has demonstrated excellent accuracy in image recognition and has been actively studied in various fields. However, ViT requires a large matrix multiplication called Attention, which is computationally expensive. Since the computational cost of Self-Attention used in ViT increases quadratically with the number of tokens, research to reduce the computational cost by pruning the number of tokens has been active in recent years. To prune tokens, it is necessary to set the pruning rate, and in many studies, the pruning rate is set manually. However, it is difficult to manually determine the optimal pruning rate because the appropriate pruning rate varies from task to task. In this study, we propose a method to solve this problem. The proposed pruning rate adjustment adjusts the pruning rate so that the training loss is converged by Gradient-Aware Scaling (GAS). In addition, we propose Variable Proportional Attention (VPA) for Top-K, a general-purpose token pruning method, to mitigate the performance loss due to pruning. For the CIFAR-10 dataset, several competitive pruning methods improve recognition accuracy over manually setting the pruning rate; eTPS+Adjust on Hybrid ViT-S achieves 99.01% Accuracy with -31.68% FLOPs. Furthermore, Top-K+VPA outperforms token merging when the pruning rate is large for trained ViT-L inference on ImageNet-1k and has superior scalability in the Accuracy-Latency relation. In particular, when Top-K+VPA is applied to ViT-L on a GPU environment with a pruning rate of 6%, it achieves 80.62% Accuracy on the ImageNet-1k dataset with -50.44% FLOPs and -46.8% Latency.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Automatic pruning rate adjustment for dynamic token reduction in vision transformer

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Intelligence

Lead the way for us

Similar Papers

Attention-based adaptive structured continuous sparse network pruning
Jiaxin Liu ... Wenxing Yang
Neurocomputing | VOL. 590
Jiaxin Liu, et. al.Jiaxin Liu ... Wenxing Yang
15 Apr 2024
Neurocomputing | VOL. 590

Automatic Pruning Rate Derivation for Structured Pruning of Deep Neural Networks
Yasufumi Sakai ... Akinori Iwakawa
-
Yasufumi Sakai, et. al.Yasufumi Sakai ... Akinori Iwakawa
21 Aug 2022
21 Aug 2022

Enhanced pooling method for convolutional neural networks based on optimal search theory
Xin Lai ... Le Zhou
IET Image Processing | VOL. 13
Xin Lai, et. al.Xin Lai ... Le Zhou
02 Sep 2019
IET Image Processing | VOL. 13

Filter Pruning via Automatic Pruning Rate Search
Qiming Sun ... Shan Cao
-
Qiming Sun, et. al.Qiming Sun ... Shan Cao
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Automatic pruning rate adjustment for dynamic token reduction in vision transformer

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Intelligence