Abstract

Deep learning is increasingly popular in remote sensing communities and already successful in land cover classification and semantic segmentation. However, most studies are limited to the utilization of optical datasets. Despite few attempts applied to synthetic aperture radar (SAR) using deep learning, the huge potential, especially for the very high resolution (VHR) SAR, are still underexploited. Taking building segmentation as an example, the VHR SAR datasets are still missing to the best of our knowledge. A comparable baseline for SAR building segmentation does not exist, and which segmentation method is more suitable for SAR image is poorly understood. This article first provides a benchmark high-resolution (1 m) GaoFen-3 SAR datasets, which cover nine cities from seven countries, review the state-of-the-art semantic segmentation methods applied to SAR, and then summarize the potential operations to improve the performance. With these comprehensive assessments, we hope to provide the recommendation and roadmap for future SAR semantic segmentation.

Highlights

  • D UE to the reason that building is the main component in urban cities, building semantic segmentation attracts more attention in urban remote sensing studies

  • high-resolution net (HRNet) obtained the best performance for RGB and synthetic aperture radar (SAR) datasets in terms of IoU and F1 scores, followed by U-Net

  • We investigated the performance with different pretraining weights, including Imagenet, Instagram, SSL on Imagenet, SWSL on Imagenet, from the encoder of ResNeXt101_32×8 d

Read more

Summary

INTRODUCTION

D UE to the reason that building is the main component in urban cities, building semantic segmentation attracts more attention in urban remote sensing studies. XIA et al.: BENCHMARK HIGH-RESOLUTION GAOFEN-3 SAR DATASET FOR BUILDING SEMANTIC SEGMENTATION instance, Shahzad et al [19] adopted the integration of fully convolution neural networks and conditional random field to detect buildings of TerraSAR-X SAR image. Yao et al [24] constructed the datasets from three data sources (with a resolution of 2.9 m): TerraSAR-X images, Google Earth images, and Open Street Map (OSM) data, to perform SAR and optical image semantic segmentation. We included the Google Earth image as optical images to thoroughly investigate the performance between different modality and their combinations using deep-learning baseline models These baseline models are fundamental to the community, which can help us to deeply understand the capability of stateof-the-art segmentation models for working with SAR data. 3) The influence and the potential solution to improve the performance is given

Study Area
Google Earth Images
Content of the GFB Dataset
BASELINE OF SEMANTIC SEGMENTATION
Pyramid Scene Parsing Network
Feature Pyramid Network
Deeplab
EXPERIMENTAL SETTINGS
Segmentation models
Investigation on a Single Model
Investigation of Multiple Models
Other Influences
DISCUSSIONS
Does Advanced Pretraining Required?
Do We Need a Multichannel Mask and Postprocessing?
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call