Abstract

Abstract. In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.

Highlights

  • The semantic segmentation of aerial imagery refers to the task of assigning a semantic label (e.g. Building, Impervious Surface, Car or Vegetation) to each pixel and thereby providing meaningful segments

  • Thereby, we focus on the extraction of hand-crafted features as the basis for classification and on the construction of three different types of deep networks given by the Shuffling Convolutional Neural Network (SCNN), the Multi-scale Shuffling Convolutional Neural Network (MSCNN) and the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN), respectively

  • We focus on a multi-scale extension of Shuffling Convolutional Neural Networks (Chen et al, 2018a; Chen et al, 2018b) involving deep supervision, and we thereby involve a diversity of hand-crafted radiometric and geometric features extracted from the true orthophotos and their corresponding Digital Surface Models (DSMs), respectively

Read more

Summary

Introduction

The semantic segmentation of aerial imagery refers to the task of assigning a semantic label (e.g. Building, Impervious Surface, Car or Vegetation) to each pixel and thereby providing meaningful segments. Over the last few years, this kind of image interpretation has become a topic of great interest in remote sensing (Volpi and Tuia, 2017; Chen et al, 2018a; Maggiori et al, 2017; Marmanis et al, 2016; Paisitkriangkrai et al, 2016) and in the field of computer vision (Chen et al, 2016; Zhao et al, 2016; Liu et al, 2015; Badrinarayanan et al, 2017) Some benchmarks such as the ISPRS Benchmark on 2D Semantic Labeling (Rottensteiner et al, 2012) have been initiated to foster research on the semantic segmentation of aerial imagery. CNNs are widely applied to the semantic segmentation of aerial imagery (Volpi and Tuia, 2017; Marmanis et al, 2016; Maggiori et al, 2017; Paisitkriangkrai et al, 2016; Chen et al, 2018a)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call