Abstract

The semantic segmentation of fine-resolution remotely sensed images is an urgent issue in satellite image processing. Solving this problem can help overcome various obstacles in urban planning, land cover classification, and environmental protection, paving the way for scene-level landscape pattern analysis and decision making. Encoder-decoder structures based on attention mechanisms have been frequently used for fine-resolution image segmentation. In this paper, we incorporate a coordinate attention (CA) mechanism, adopt an asymmetric convolution block (ACB), and design a refinement fusion block (RFB), forming a network named the fusion coordinate and asymmetry-based U-Net (FCAU-Net). Furthermore, we propose novel convolutional neural network (CNN) architecture to fully capture long-term dependencies and fine-grained details in fine-resolution remotely sensed imagery. This approach has the following advantages: (1) the CA mechanism embeds position information into a channel attention mechanism to enhance the feature representations produced by the network while effectively capturing position information and channel relationships; (2) the ACB enhances the feature representation ability of the standard convolution layer and captures and refines the feature information in each layer of the encoder; and (3) the RFB effectively integrates low-level spatial information and high-level abstract features to eliminate background noise when extracting feature information, reduces the fitting residuals of the fused features, and improves the ability of the network to capture information flows. Extensive experiments conducted on two public datasets (ZY-3 and DeepGlobe) demonstrate the effectiveness of the FCAU-Net. The proposed FCAU-Net transcends U-Net, Attention U-Net, the pyramid scene parsing network (PSPNet), DeepLab v3+, the multistage attention residual U-Net (MAResU-Net), MACU-Net, and the Transformer U-Net (TransUNet). Specifically, the FCAU-Net achieves a 97.97% (95.05%) pixel accuracy (PA), a 98.53% (91.27%) mean PA (mPA), a 95.17% (85.54%) mean intersection over union (mIoU), and a 96.07% (90.74%) frequency-weighted IoU (FWIoU) on the ZY-3 (DeepGlobe) dataset.

Highlights

  • These datasets consist of remotely sensed cultivated land images obtained by Ziyuan-3 satellite sensors (ZY-3) and building images extracted from the Digital Earth

  • Based on the confusion matrix, we used the mean intersection over union, pixel accuracy (PA), mean pixel accuracy, and frequency-weighted IoU (FWIoU) as the critical evaluation metrics to calculate the difference between the predicted mask and the ground truth (GT)

  • We evaluated the accuracy of the FCAU-Net and other architectures on the ZY-3 and DeepGlobe datasets based on the PA, mean PA (mPA), mean intersection over union (mIoU), and FWIoU

Read more

Summary

Introduction

MAResU-Net [35] embeds a multistage attention model into the direct skip connections of the original U-Net, thereby refining the multiscale feature maps Unlike these methods that use expensive and heavyweight nonlocal or self-attention blocks, a coordinate attention (CA) mechanism [47] that effectively captures the position information and channel-wise relationships has been proposed. We use an ACB to capture and refine the obtained features by enhancing the weights of the central crisscross positions to improve the convolutional layer’s representation capabilities. 2022, 14, x FOR PEER REVIEW (2) In the decoding process, we use an ACB to capture and refine the obtained features by enhancing the weights of the central crisscross positions to improve the convolutional layer’s representation capabilities.

2.2.Methodology
CA Module
ACB Module
RFB Module
Datasets
Implementation Details
Evaluation Metrics
Experimental Results
Method
Ablation Study
Influence of the Input Size
Optimization
Limitations and Future Work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call