ECTFormer: An efficient Conv-Transformer model design for image recognition

Jaewon Sa,Junhwan Ryu,Heegon Kim

doi:10.1016/j.patcog.2024.111092

Abstract

Since the success of Vision Transformers (ViTs), there has been growing interest in combining ConvNets and Transformers in the computer vision community. While the hybrid models have demonstrated state-of-the-art performance, many of these models are too large and complex to be applied to edge devices for real-world applications. To address this challenge, we propose an efficient hybrid network called ECTFormer that leverages the strengths of ConvNets and Transformers while considering both model performance and inference speed. Specifically, our approach involves: (1) optimizing the combination of convolution kernels by dynamically adjusting kernel sizes based on the scale of feature tensors; (2) revisiting existing overlapping patchify to not only reduce the model size but also propagate fine-grained patches for the performance enhancement; and (3) introducing an efficient single-head self-attention mechanism, rather than multi-head self-attention in the base Transformer, to minimize the increase in model size and boost inference speed, overcoming bottlenecks of ViTs. In experimental results on ImageNet-1K, ECTFormer not only demonstrates comparable or higher top-1 accuracy but also faster inference speed on both GPUs and edge devices compared to other efficient networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ECTFormer: An efficient Conv-Transformer model design for image recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring
Maitreya Suin ... A N Rajagopalan
-
Maitreya Suin, et. al.Maitreya Suin ... A N Rajagopalan
01 Jun 2020
01 Jun 2020

Quantized Reservoir Computing on Edge Devices for Communication Applications
Shiya Liu ... Yang Yi
-
Shiya Liu, et. al.Shiya Liu ... Yang Yi
01 Nov 2020
01 Nov 2020

A Faster and Lightweight Lane Detection Method in Complex Scenarios
Shuaiqi Nie ... Libo Yun
Electronics | VOL. 13
Shuaiqi Nie, et. al.Shuaiqi Nie ... Libo Yun
25 Jun 2024
Electronics | VOL. 13

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation.
Kai Lu ... Jieren Cheng
Sensors | VOL. 23
Kai Lu, et. al.Kai Lu ... Jieren Cheng
13 Jul 2023
Sensors | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ECTFormer: An efficient Conv-Transformer model design for image recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition