Abstract

Semantic segmentation of remote sensing imagery is a fundamental task in intelligent interpretation. Since deep convolutional neural networks (DCNNs) performed considerable insight in learning implicit representations from data, numerous works in recent years have transferred the DCNN-based model to remote sensing data analysis. However, the wide-range observation areas, complex and diverse objects and illumination and imaging angle influence the pixels easily confused, leading to undesirable results. Therefore, a remote sensing imagery semantic segmentation neural network, named HCANet, is proposed to generate representative and discriminative representations for dense predictions. HCANet hybridizes cross-level contextual and attentive representations to emphasize the distinguishability of learned features. First of all, a cross-level contextual representation module (CCRM) is devised to exploit and harness the superpixel contextual information. Moreover, a hybrid representation enhancement module (HREM) is designed to fuse cross-level contextual and self-attentive representations flexibly. Furthermore, the decoder incorporates DUpsampling operation to boost the efficiency losslessly. The extensive experiments are implemented on the Vaihingen and Potsdam benchmarks. In addition, the results indicate that HCANet achieves excellent performance on overall accuracy and mean intersection over union. In addition, the ablation study further verifies the superiority of CCRM.

Highlights

  • Remote sensing imagery (RSI) semantic segmentation has been a fundamental task in interpreting and parsing the observation areas and objects [1]

  • Convolutional block attention module (CBAM), DANet and NLNet are slightly problematic in enhancing separability of representations that extracted from remote sensing images

  • Attention-based methods are of little significance. It is determined by the impractical application to remote sensing image semantic segmentation, which is more than an essential computer vision task

Read more

Summary

Introduction

Remote sensing imagery (RSI) semantic segmentation has been a fundamental task in interpreting and parsing the observation areas and objects [1]. Inspired by the self-attention mechanism, SCAttNet (semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images) [24] was proposed to learn the attention map to aggregate contextual information for every point adaptively in RSI. LANet (local attention embedding to improve the semantic segmentation of remote sensing images) [25] designed a patch attention module to enrich the local context information in addition to the global ones, leading to a competitive performance with fewer computations. The HCANet (hybridizing cross-level contextual and attentive representations neural network) is proposed for remote sensing imagery semantic segmentation. A sub-branch that introduces non-local block to refine encoded feature maps is implemented Afterward, this module adopts a concatenation operation followed by a 1 × 1 convolution layer to realize the injection of both two optimized representations before expansion.

Semantic Segmentation of RSI
Attention Mechanism
The Proposed Method
Non-Local Block
Superpixel Context
DUpsampling
The Framework of HCANet
Superpixel Region Generation and Representation
Cross-Level Contextual Representation
Hybrid Representation Enhancement Module
ISPRS Vaihingen Dataset
ISPRS Potsdam Dataset
Implement Details
Evaluation Metrics
Results on Vaihingen Test Set
Methods
Results on Potsdam Test Set
Ablation Study on CCRM
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.