Abstract
Semantic segmentation has been a fundamental task in interpreting remote sensing imagery (RSI) for various downstream applications. Due to the high intra-class variants and inter-class similarities, inflexibly transferring natural image-specific networks to RSI is inadvisable. To enhance the distinguishability of learnt representations, attention modules were developed and applied to RSI, resulting in satisfactory improvements. However, these designs capture contextual information by equally handling all the pixels regardless of whether they around edges. Therefore, blurry boundaries are generated, rising high uncertainties in classifying vast adjacent pixels. Hereby, we propose an edge distribution attention module (EDA) to highlight the edge distributions of leant feature maps in a self-attentive fashion. In this module, we first formulate and model column-wise and row-wise edge attention maps based on covariance matrix analysis. Furthermore, a hybrid attention module (HAM) that emphasizes the edge distributions and position-wise dependencies is devised combing with non-local block. Consequently, a conceptually end-to-end neural network, termed as EDENet, is proposed to integrate HAM hierarchically for the detailed strengthening of multi-level representations. EDENet implicitly learns representative and discriminative features, providing available and reasonable cues for dense prediction. The experimental results evaluated on ISPRS Vaihingen, Potsdam and DeepGlobe datasets show the efficacy and superiority to the state-of-the-art methods on overall accuracy (OA) and mean intersection over union (mIoU). In addition, the ablation study further validates the effects of EDA.
Highlights
Semantic segmentation, a fundamental task for interpreting remote sensing imagery (RSI), is currently essential in various fields, such as water resource management [1,2], land cover classification [3,4,5], urban planning [6,7] and precision agriculture [8,9] and so forth
It is apparent that the highest overall accuracy (OA) and mean intersection over union (mIoU) values are obtained by enhanced semantic segmentation neural network (EDENet), demonstrating exceptionally good performance in accuracy
The OA and mIoU are relatively higher than Vaihingen by EDENet
Summary
A fundamental task for interpreting remote sensing imagery (RSI), is currently essential in various fields, such as water resource management [1,2], land cover classification [3,4,5], urban planning [6,7] and precision agriculture [8,9] and so forth. Zhang et al [25] designed a multi-scale context aggregation network This network encodes the raw image by the high-resolution network (HRNet) [26], in which four parallel branches are presented to generate four sizes of feature maps. The non-local neural network [32] was proposed to learn the position-wise attention maps both in spatial and channel domains. The purpose of this study includes two aspects: (1) the edge knowledge is urgent to be explicitly modeled and incorporated into learnt representations, facilitating the network’s discriminative capability in labeling pixels that position at marginal areas; (2) the extraction and incorporation of edge distributions should be learnable and end-to-end trainable without breaking the inherent spatial structure. One is learning edge distributions modelled by the re-defined covariance matrix following the inherently spatial structure of encoded feature maps.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have