Abstract

Abstract. We analyze the effects of additional height data for semantic segmentation of aerial images with a convolutional encoder-decoder network. Besides a merely image-based semantic segmentation, we trained the same network with height as additional input and furthermore, we defined a multi-task model, where we trained the network to estimate the relative height of objects in parallel to semantic segmentation on the image data only. Our findings are, that excellent results are possible for image data only and additional height information has no significant effect – neither when employed as extra input nor when used for multi-task training, even with differently weighted losses. Based on our results, we, thus, hypothesize that a strong encoder-decoder network implicitly learns the correlation of object categories and relative heights.

Highlights

  • Semantic segmentation, i.e., pixel-wise classification, using Convolutional Networks (ConvNets) has been shown to produce very good results

  • Many different approaches have been published in recent years, starting from adapting well-known architectures for image classification and fine-tuning them for semantic segmentation (Long et al, 2015), up to specific architectures that are trained directly for semantic segmentation without any pre-training (Jégou et al, 2017)

  • Other authors have dealt with multi-task learning (Kendall et al, 2018), defining a ConvNet treating three tasks in parallel: semantic segmentation, instance segmentation and depth estimation from single images of road scenes

Read more

Summary

Introduction

I.e., pixel-wise classification, using Convolutional Networks (ConvNets) has been shown to produce very good results. Other authors have presented a fusion model for semantic segmentation of aerial images, combining image data as well as height data in a single ConvNet (Zhang et al, 2017). In their experiments, they examined the influence of the height data when fusing it with the image data at different depths of the network based on the sensitivity for single classes. Other authors have dealt with multi-task learning (Kendall et al, 2018), defining a ConvNet treating three tasks in parallel: semantic segmentation, instance segmentation and depth estimation from single images of road scenes.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call