A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

Lorenzo Papa,Paolo Russo,Irene Amerini,Luping Zhou

doi:10.1109/tpami.2024.3392941

Lorenzo Papa, Paolo Russo + Show 2 more

https://doi.org/10.1109/tpami.2024.3392941

Copy DOI

Abstract

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper firstly mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2024
Citations: 4	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Similar Papers

Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation.
Yiming Zhu ... Yuan Wei
Sensors (Basel, Switzerland) | VOL. 24
Yiming Zhu, et. al.Yiming Zhu ... Yuan Wei
01 Mar 2024
Sensors (Basel, Switzerland) | VOL. 24

Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification
Achyut Mani Tripathi ... Om Jee Pandey
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Achyut Mani Tripathi, et. al.Achyut Mani Tripathi ... Om Jee Pandey
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

DS-P3SNet: An Efficient Classification Approach for Devanagari Script-Based P300 Speller Using Compact Channelwise Convolution and Knowledge Distillation
Ghanahshyam B Kshirsagar ... Narendra D Londhe
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 52
Ghanahshyam B Kshirsagar, et. al.Ghanahshyam B Kshirsagar ... Narendra D Londhe
01 Dec 2022
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 52

Effective Online Knowledge Distillation via Attention-Based Model Ensembling
Diana-Laura Borza ... Alexandru-Ion Marinescu
Mathematics | VOL. 10
Diana-Laura Borza, et. al.Diana-Laura Borza ... Alexandru-Ion Marinescu
16 Nov 2022
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence