Visualizing and Understanding Patch Interactions in Vision Transformer.

Jie Ma,Yalong Bai,Bineng Zhong,Wei Zhang,Ting Yao,Tao Mei

doi:10.1109/tnnls.2023.3270479

Abstract

Vision transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having good success, the literature seldom explores the explainability of ViT, and there is no clear picture of how the attention mechanism with respect to the correlation across comprehensive patches will impact the performance and what is the further potential. In this work, we propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for ViT. Specifically, we first introduce a quantification indicator to measure the impact of patch interaction and verify such quantification on attention window design and indiscriminative patches removal. Then, we exploit the effective responsive field of each patch in ViT and devise a window-free transformer (WinfT) architecture accordingly. Extensive experiments on ImageNet demonstrate that the exquisitely designed quantitative method is shown able to facilitate ViT model learning, leading the top-1 accuracy by 4.28% at most. More remarkably, the results on downstream fine-grained recognition tasks further validate the generalization of our proposal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visualizing and Understanding Patch Interactions in Vision Transformer.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems

Lead the way for us

Journal: IEEE transactions on neural networks and learning systems	Publication Date: Oct 1, 2024
Citations: 11

Similar Papers

Summary of fine-grained image recognition based on attention mechanism
Yao Ma ... Min Zhi
-
Yao Ma, et. al.Yao Ma ... Min Zhi
16 Feb 2022
16 Feb 2022

Hyperrealistic Image Inpainting with Hypergraphs
Gourav Wadhwa ... Abhinav Dhall
-
Gourav Wadhwa, et. al.Gourav Wadhwa ... Abhinav Dhall
01 Jan 2020
01 Jan 2020

An Empirical Study of Remote Sensing Pretraining
Di Wang ... Dacheng Tao
IEEE Transactions on Geoscience and Remote Sensing | VOL. 61
Di Wang, et. al.Di Wang ... Dacheng Tao
01 Jan 2023
IEEE Transactions on Geoscience and Remote Sensing | VOL. 61

Sparse Graph Transformer With Contrastive Learning
Chun-Yang Zhang ... C L Philip Chen
IEEE Transactions on Computational Social Systems | VOL. 11
Chun-Yang Zhang, et. al.Chun-Yang Zhang ... C L Philip Chen
01 Feb 2024
IEEE Transactions on Computational Social Systems | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visualizing and Understanding Patch Interactions in Vision Transformer.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems