Efficient image analysis with triple attention vision transformer

Gehui Li,Tongtong Zhao

doi:10.1016/j.patcog.2024.110357

Abstract

This paper introduces TrpViT, a novel triple attention vision transformer that efficiently captures both local and global features. The proposed architecture tackles global information acquisition by employing three complementary attention mechanisms in a unique attention block: Window, Dilated, and Channel attention. This attention block extracts spatially local features while expanding the receptive field to capture richer global context. By integrating this attention block with convolution, a new C-C-T-T architecture is formed. We rigorously evaluate TrpViT, demonstrating state-of-the-art performance on various computer vision tasks, including image classification, 2D and 3D object detection, instance segmentation, and low-level image colorization. Notably, TrpViT achieves strong accuracy across all parameter scales, highlighting its computational efficiency and effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient image analysis with triple attention vision transformer

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Feb 21, 2024
Citations: 4

Similar Papers

3D Object Detection and Instance Segmentation from 3D Range and 2D Color Images.
Xiaoke Shen ... Ioannis Stamos
Sensors | VOL. 21
Xiaoke Shen, et. al.Xiaoke Shen ... Ioannis Stamos
09 Feb 2021
Sensors | VOL. 21

Bi-stage multi-modal 3D instance segmentation method for production workshop scene
Zaizuo Tang ... Yuanyuan Wu
Engineering Applications of Artificial Intelligence | VOL. 112
Zaizuo Tang, et. al.Zaizuo Tang ... Yuanyuan Wu
13 Apr 2022
Engineering Applications of Artificial Intelligence | VOL. 112

Multifeature Fusion-Based Object Detection for Intelligent Transportation Systems
Shuo Yang ... Huimin Lu
IEEE Transactions on Intelligent Transportation Systems | VOL. 24
Shuo Yang, et. al.Shuo Yang ... Huimin Lu
01 Jan 2023
IEEE Transactions on Intelligent Transportation Systems | VOL. 24

3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data
Siddiqui Muhammad Yasir ... Hyunsik Ahn
Computers, Materials & Continua | VOL. 72
Siddiqui Muhammad Yasir, et. al.Siddiqui Muhammad Yasir ... Hyunsik Ahn
01 Jan 2021
Computers, Materials & Continua | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient image analysis with triple attention vision transformer

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition