Abstract

Deep convolutional networks are of great significance for the automatic semantic annotation of remotely sensed images. Object position and semantic labeling are equally important in semantic segmentation tasks. However, the convolution and pooling operations of the convolutional network will affect the image resolution when extracting semantic information, which makes acquiring semantics and capturing positions contradictory. We design a duplex restricted network with guided upsampling. The detachable enhancement structure to separate opposing features on the same level. In this way, the network can adaptively choose how to trade-off classification and localization tasks. To optimize the detailed information obtained by encoding, a concentration-aware guided upsampling module is further introduced to replace the traditional upsampling operation for resolution restoration. We also add a content capture normalization module to enhance the features extracted in the encoding stage. Our approach uses fewer parameters and significantly outperforms previous results on two very high resolution (VHR) datasets: 84.81% (vs 82.42%) on the Potsdam dataset and 86.76% (vs 82.74%) on the Jiage dataset.

Highlights

  • The semantic segmentation task is assigning semantic labels to each pixel of an image, which is fundamental work in the field of computer vision

  • We propose a novel duplex path element extraction network, termed the duplex restricted network (DRN) with guided upsampling, to effectively utilize the information encoded by the backbone

  • DUPLEX RESTRICTED NETWORK WITH GUIDED UPSAMPLING we describe the overall framework of the network and elaborate the implementation details of the three aspects we proposed: (i) The content capture normalization module based on atrous convolution. (ii) The detachable enhancement structure for obtaining different characteristics from the same receptive field. (iii) A novel upsampling strategy (CAGU) to achieve higher performance

Read more

Summary

INTRODUCTION

The semantic segmentation task is assigning semantic labels to each pixel of an image, which is fundamental work in the field of computer vision. There are three main solutions to ensure the accuracy of location information while extracting context information: (i) Multiscale inference [16]–[18] inputs images of different resolutions into the network, and the results are obtained by multiscale fusion. It does not change the size of the network’s receptive field, it allows the network to share diverse information by adjusting the scale of the input image, which improves the network’s ability to process multiscale elements. We use 23.9M parameters and obtain an 83.76% mIoU on the Potsdam test set and an 85.87% mIoU on the Jiage test set

RELATED WORKS
EXPERIMENTS
EVALUATION CRITERIA
EVALUATION RESULTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call