STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Liang Gao,Zhengqing Xiao,Long Chen,Yaling Wan,Hui Liu,Minhang Yang,Yurong Qian

doi:10.1109/jstars.2021.3119654

Abstract

The applied research in remote sensing images has been pushed by convolutional neural network (CNN). Because of the fixed size of the perceptual field, CNN is unable to model global semantic relevance. Modeling global semantic information is possible with the self-attentive Transformer-based model. However, the method of patch computation used by Transformer for self-attentive computation ignores the spatial information inside each patch. To address these issues, we offer the STransFuse model as a new semantic segmentation method for remote sensing images. It is a model that combines the benefits of Transformer with CNN to improve the segmentation quality of various remote sensing images. We employ a staged model to extract coarse-grained and fine-grained feature representations at various semantic scales, unlike earlier techniques based on Transformer model fusion. In order to take full advantage of the features acquired at different stages, we designed an adaptive fusion module. This module adaptively fuses the semantic information between features at different scales employing a self-attentive mechanism. The overall accuracy (OA) of our proposed model on the Vaihingen dataset is 1.36% higher than the baseline, and 1.27% improvement in OA over baseline on the Potsdam dataset. When compared to other advanced models, the STransFuse model performs admirably.

Highlights

A PIXEL-LEVEL classification challenge, semantic segmentation of remote sensing images, is an essential problem for remote sensing research
Because many indicators are based on confusion matrix for calculation, before introducing the specific formula of each indicator, the meaning of some symbols of the confusion matrix is defined as follows: True positive (TP), true negative (TN), false positive (FP), and false negative (FN)
Because Transformer is based on self-attention for semantic computation, the number of model parameters improved based on Transformer is large, and our designed STransFuse model can balance the number of model parameters and experimental performance

Summary

INTRODUCTION

A PIXEL-LEVEL classification challenge, semantic segmentation of remote sensing images, is an essential problem for remote sensing research. GAO et al.: STRANSFUSE: FUSING SWIN TRANSFORMER AND CNN FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION. Inspired by the Unet network [9], we fused the feature maps of different stages, which were used to obtain semantic contextual information and spatial contextual information of the images. To this end, we propose a model for semantic segmentation of remote sensing images, STransFuse. Inspired by the paper [14], we used Resnet with pretrained weights as the network backbone of the CNN branch, combined with Swin Transformer, to obtain the rich feature information of remote sensing images.

Semantic Segmentation of Remote Sensing Images

Contextual Information

Transformer

Overview

STransFuse Overall Architecture

Swin Transformer Block

AFM Block

Dataset

Evaluation Metric

Training Configuration

Ablation Studies

Visualization Analysis

Window Size Impact Analysis

Confusion Matrix

Evaluation and Comparisons on the Vaihingen Dataset

Evaluation and Comparisons on the Potsdam Dataset

Comparison of the Efficiency of State-of-the-Art Models in Different Datasets

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing	Publication Date: Jan 1, 2021
Citations: 108	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Lead the way for us

Similar Papers

A semantic segmentation method with category boundary for Land Use and Land Cover (LULC) mapping of Very-High Resolution (VHR) remote sensing image
Zeyu Xu ... Xiaocan Zhang
International Journal of Remote Sensing | VOL. 42
Zeyu Xu, et. al.Zeyu Xu ... Xiaocan Zhang
28 Jan 2021
International Journal of Remote Sensing | VOL. 42

Relation Matters: Relational Context-Aware Fully Convolutional Network for Semantic Segmentation of High-Resolution Aerial Images
Lichao Mou ... Xiao Xiang Zhu
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58
Lichao Mou, et. al.Lichao Mou ... Xiao Xiang Zhu
01 Nov 2020
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58

CNN-BASED FEATURE-LEVEL FUSION OF VERY HIGH RESOLUTION AERIAL IMAGERY AND LIDAR DATA
S Daneshtalab ... H Rastiveis
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-4/W18
S Daneshtalab, et. al.S Daneshtalab ... H Rastiveis
18 Oct 2019
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-4/W18

Lithology Classification Using TASI Thermal Infrared Hyperspectral Data with Convolutional Neural Networks
Huize Liu ... Ying Xu
Remote Sensing | VOL. 13
Huize Liu, et. al.Huize Liu ... Ying Xu
06 Aug 2021
Remote Sensing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing