Abstract

Very high resolution (VHR) remote sensing images contain various multi-scale objects, such as large-scale buildings and small-scale cars. However, these multi-scale objects cannot be considered simultaneously in the widely used backbones with large downsampling factor (e.g. VGG-like and ResNet-like), resulting in the appearance of various context aggregation approaches such as fusing low-level features and attention-based modules. To alleviate this problem caused by backbones with large downsampling factor, we propose a feature-selection high-resolution network (FSHRNet) based on an observation: if the features maintain high resolution throughout the network, a high precision segmentation result can be obtained by only using a 1×1 convolution layer with no need for complex context aggregation modules. Specifically, the backbone of FSHRNet is a multi-branch structure similar to HRNet where the high-resolution branch is the principal line. Then, a lightweight dynamic weight module, named feature-selection convolution layer (FSConv), is presented to fuse multi-resolution features, allowing adaptively feature selection based on the characteristic of objects. Finally, a specially designed 1×1 convolution layer derived from hypersphere embedding is used to produce the segmentation result. Experiments with other widely used methods show that the proposed FSHRNet obtains competitive performance on ISPRS Vaihingen dataset, ISPRS Potsdam dataset and iSAID dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call