Abstract

Semantic segmentation has been a fundamental task in interpreting remote sensing imagery (RSI) for various downstream applications. Due to the high intra-class variants and inter-class similarities, inflexibly transferring natural image-specific networks to RSI is inadvisable. To enhance the distinguishability of learnt representations, attention modules were developed and applied to RSI, resulting in satisfactory improvements. However, these designs capture contextual information by equally handling all the pixels regardless of whether they around edges. Therefore, blurry boundaries are generated, rising high uncertainties in classifying vast adjacent pixels. Hereby, we propose an edge distribution attention module (EDA) to highlight the edge distributions of leant feature maps in a self-attentive fashion. In this module, we first formulate and model column-wise and row-wise edge attention maps based on covariance matrix analysis. Furthermore, a hybrid attention module (HAM) that emphasizes the edge distributions and position-wise dependencies is devised combing with non-local block. Consequently, a conceptually end-to-end neural network, termed as EDENet, is proposed to integrate HAM hierarchically for the detailed strengthening of multi-level representations. EDENet implicitly learns representative and discriminative features, providing available and reasonable cues for dense prediction. The experimental results evaluated on ISPRS Vaihingen, Potsdam and DeepGlobe datasets show the efficacy and superiority to the state-of-the-art methods on overall accuracy (OA) and mean intersection over union (mIoU). In addition, the ablation study further validates the effects of EDA.

Highlights

  • Semantic segmentation, a fundamental task for interpreting remote sensing imagery (RSI), is currently essential in various fields, such as water resource management [1,2], land cover classification [3,4,5], urban planning [6,7] and precision agriculture [8,9] and so forth

  • It is apparent that the highest overall accuracy (OA) and mean intersection over union (mIoU) values are obtained by enhanced semantic segmentation neural network (EDENet), demonstrating exceptionally good performance in accuracy

  • The OA and mIoU are relatively higher than Vaihingen by EDENet

Read more

Summary

Introduction

A fundamental task for interpreting remote sensing imagery (RSI), is currently essential in various fields, such as water resource management [1,2], land cover classification [3,4,5], urban planning [6,7] and precision agriculture [8,9] and so forth. Zhang et al [25] designed a multi-scale context aggregation network This network encodes the raw image by the high-resolution network (HRNet) [26], in which four parallel branches are presented to generate four sizes of feature maps. The non-local neural network [32] was proposed to learn the position-wise attention maps both in spatial and channel domains. The purpose of this study includes two aspects: (1) the edge knowledge is urgent to be explicitly modeled and incorporated into learnt representations, facilitating the network’s discriminative capability in labeling pixels that position at marginal areas; (2) the extraction and incorporation of edge distributions should be learnable and end-to-end trainable without breaking the inherent spatial structure. One is learning edge distributions modelled by the re-defined covariance matrix following the inherently spatial structure of encoded feature maps.

Attention Mechanism
Revisiting 2DPCA
Non-Local Block
Overview
Overall
Re-Defining Covariance Matrix for Feature Matrix
Edge Distribution Attention Module
Hybrid Attention Module
Pipeline
Datasets
Hyper-Parameters and Implementation Details
Numerical Metrics
Comparison with State-of-the-Art
Results on Vaihingen Dataset
Results on Potsdam Dataset
Ablation Study of EDA
Discussions
Conclusions
Training
Automated
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call