Abstract

Abstract. Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.

Highlights

  • Up-to-date 3D building models are crucial for many applications, such as water management, flooding simulation and urban planing

  • Mask R-Convolutional neural networks (CNNs) consists of three networks: feature pyramid network (FPN), regional proposal network (RPN) and fast R-CNN

  • The reason for little higher precision of fully convolutional networks (FCN) is that FCN has a bit better completeness in classifying large buildings, while the mask derived in R-CNN relies on object detection

Read more

Summary

INTRODUCTION

Up-to-date 3D building models are crucial for many applications, such as water management, flooding simulation and urban planing. If large patches are selected, the coarse resolution of the output from CNN due to pooling, which intends to extract high level features, is prone to losing small objects (Yuan, 2018, Ren et al, 2018). Layer techniques to merge high and low resolution level feature maps for semantic segmentation. Different scales of objects are predicted from a single high level features map of a fine resolution. (Yuan, 2018, Marmanis et al, 2018, Bittner et al, 2018) reported that the FCN with the most impact on semantic segmentation (Long et al, 2015) do not perform well in detecting building edges from VHR images. Mask R-CNN (He et al, 2017) utilizes region proposal network (Ren et al, 2015) to select the optimal level feature maps from pyramid for each region (object) detected.

MASK R-CNN
Experiment
Evaluation
Result
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call