Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition.

Qibin Hou,Cheng-Ze Lu,Ming-Ming Cheng,Jiashi Feng

doi:10.1109/tpami.2024.3401450

Abstract

Vision Transformers have been the most popular network architecture in visual recognition recently due to the strong ability of encode global information. However, its high computational cost when processing high-resolution images limits the applications in downstream tasks. In this paper, we take a deep look at the internal structure of self-attention and present a simple Transformer style convolutional neural network (ConvNet) for visual recognition. By comparing the design principles of the recent ConvNets and Vision Transformers, we propose to simplify the self-attention by leveraging a convolutional modulation operation. We show that such a simple approach can better take advantage of the large kernels ( ≥ 7×7) nested in convolutional layers and we observe a consistent performance improvement when gradually increasing the kernel size from 5×5 to 21×21. We build a family of hierarchical ConvNets using the proposed convolutional modulation, termed Conv2Former. Our network is simple and easy to follow. Experiments show that our Conv2Formeroutperforms existent popular ConvNets and vision Transformers, like Swin Transformer and ConvNeXt in all ImageNet classification, COCO object detection and ADE20k semantic segmentation. Our code is available at https://github.com/HVision-NKU/Conv2Former.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2024
Citations: 6

Similar Papers

Fish Shoals Behavior Detection Based on Convolutional Neural Network and Spatiotemporal Information
Fangfang Han ... Baofeng Zhang
IEEE Access | VOL. 8
Fangfang Han, et. al.Fangfang Han ... Baofeng Zhang
01 Jan 2020
IEEE Access | VOL. 8

Simple dilated convolutional neural network for quantitative modeling based on near infrared spectroscopy techniques
Feng Gan ... Jianfei Luo
Chemometrics and Intelligent Laboratory Systems | VOL. 232
Feng Gan, et. al.Feng Gan ... Jianfei Luo
12 Nov 2022
Chemometrics and Intelligent Laboratory Systems | VOL. 232

Simple convolutional neural network on image classification
Tianmei Guo ... Henjian Li
-
Tianmei Guo, et. al.Tianmei Guo ... Henjian Li
01 Mar 2017
01 Mar 2017

A self-supervised learning model based on variational autoencoder for limited-sample mammogram classification
Meryem Altin Karagoz ... O Ufuk Nalbantoglu
Applied Intelligence | VOL. 54
Meryem Altin Karagoz, et. al.Meryem Altin Karagoz ... O Ufuk Nalbantoglu
01 Feb 2024
Applied Intelligence | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence