Crisscross-Global Vision Transformers Model for Very High Resolution Aerial Image Semantic Segmentation

Guohui Deng,Miaozhong Xu,Zhiye Wang,Zhongyuan Lu,Zhaocong Wu,Chengjun Wang

doi:10.1109/tgrs.2023.3276172

Abstract

Semantic segmentation is a key means for understanding very-high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation of VHR images, with convolutional neural networks (CNNs) as the basic framework. However, owing to the highly complex details present in VHR images and the high spatial dependence of geographical objects, CNN-based methods are inadequate. This is because the inherent locality of CNNs limits the size of the receptive field, thus limiting the ability to obtain long-range context information. To solve this problem, in this paper, we propose a transformer-based novel deep learning model called crisscross-global vision transformers (CGVT). CGVT exploits the transformer’s inherent ability to obtain long-range context information to solve the restricted receptive field problem. Specifically, we redesign the self-attention mechanism in the transformer and call it crisscross-global attention. It consists of two parts: crisscross transformer encoder block (CC-TEB) and global squeeze transformer encoder block (GS-TEB). CC-TEB overcomes the limitation of the traditional self-attention design (specifically, difficulty applying it to VHR aerial image segmentation) and further increases the local feature representation ability of the model. GS-TEB increases the global feature representation ability of the model. The results of experiments conducted on the popular ISPRS Vaihingen, IEEE GRSS Data Fusion Contest Zeebrugge, and LoveDA Semantic Segmentation Challenge datasets verify the effectiveness and superiority of our proposed method. Specifically, it achieved state-of-the-art performance on both Zeebrugge and LoveDA datasets, and is currently ranked second in Vaihingen dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Crisscross-Global Vision Transformers Model for Very High Resolution Aerial Image Semantic Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Geoscience and Remote Sensing

Lead the way for us

Journal: IEEE Transactions on Geoscience and Remote Sensing	Publication Date: Jan 1, 2023
Citations: 3

Similar Papers

An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images
Ziran Ye ... Baiyu Dong
Remote Sensing | VOL. 16
Ziran Ye, et. al.Ziran Ye ... Baiyu Dong
13 Oct 2024
Remote Sensing | VOL. 16

Automatic detection of charcoal kilns on Very High Resolution images with a computer vision approach in Somalia
Astrid Verhegghen ... Marijn Van Der Velde
International Journal of Applied Earth Observation and Geoinformation | VOL. 125
Astrid Verhegghen, et. al.Astrid Verhegghen ... Marijn Van Der Velde
08 Nov 2023
International Journal of Applied Earth Observation and Geoinformation | VOL. 125

Hierarchical spatial features learning with deep CNNs for very high-resolution remote sensing image classification
Guangyun Zhang ... Xiuping Jia
International Journal of Remote Sensing | VOL. 39
Guangyun Zhang, et. al.Guangyun Zhang ... Xiuping Jia
22 Aug 2018
International Journal of Remote Sensing | VOL. 39

Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images.
Jia Song ... Yunqiang Zhu
Sensors (Basel, Switzerland) | VOL. 23
Jia Song, et. al.Jia Song ... Yunqiang Zhu
29 May 2023
Sensors (Basel, Switzerland) | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Crisscross-Global Vision Transformers Model for Very High Resolution Aerial Image Semantic Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Geoscience and Remote Sensing