Less Is More: Pay Less Attention in Vision Transformers

Zizheng Pan,Jianfei Cai,Bohan Zhuang,Haoyu He,Jing Liu

doi:10.1609/aaai.v36i2.20099

Abstract

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at https://github.com/zip-group/LIT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Less Is More: Pay Less Attention in Vision Transformers

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 35

Similar Papers

The Art of Seeing: A Computer Vision Journey into Object Detection
Mohammad Salman Khan ... Ayesha Imran
-
Mohammad Salman Khan, et. al.Mohammad Salman Khan ... Ayesha Imran
06 May 2024
06 May 2024

Texture Patterns for Object Recognition and Content-Based Color Image Retrieval

-

21 Dec 2020
21 Dec 2020

Model distillation for high-level semantic understanding：a survey
Ruoyu Sun ... Hongkai Xiong
Journal of Image and Graphics | VOL. 28
Ruoyu Sun, et. al.Ruoyu Sun ... Hongkai Xiong
01 Jan 2023
Journal of Image and Graphics | VOL. 28

Investigation of Effectiveness of Shuffled Frog-Leaping Optimizer in Training a Convolution Neural Network.
Soroush Baseri Saadi ... Ramin Ranjbarzadeh
Journal of healthcare engineering | VOL. 2022
Soroush Baseri Saadi, et. al.Soroush Baseri Saadi ... Ramin Ranjbarzadeh
23 Mar 2022
Journal of healthcare engineering | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Less Is More: Pay Less Attention in Vision Transformers

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence