Abstract

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.

Highlights

  • Automatic building extraction from high-resolution remote-sensing images has important implications in urban planning, disaster monitoring, and 3D building reconstruction [1,2]

  • Deep convolutional neural networks (DCNNs) have achieved remarkable success, owing to their great capabilities in learning representative features, which significantly promote the accuracy of semantic segmentation [5,6]

  • Pooling or convolution striding operations are repeated in DCNNs to increase the receptive field and obtain global-level semantic features; the down-sampling process dramatically decreases the initial image resolution, carrying the risk of losing important spatial details, which may result in unsatisfactory segmentation results with inaccurate edges

Read more

Summary

Introduction

Automatic building extraction from high-resolution remote-sensing images has important implications in urban planning, disaster monitoring, and 3D building reconstruction [1,2]. Its values change smoothly in space, providing the benefit of capturing supplement relationship between neighboring pixels, which are largely ignored by binary boundary maps Both of the two sources of auxiliary information are helpful to compensate for the loss of shape information for building segmentation. The consistency constraints crossing the three tasks (i.e., distance, mask, and boundary predictions) for building information are considered and constructed in the proposed multi-task network. Such consistency constraints exploit the duality between the mask prediction and two shape-related information predictions, and further improve the building segmentation performance. The constructed consistency constraint model can be readily plugged into existing basic segmentation networks

DCNN-Based Semantic Segmentation
Shape-Aware Segmentation
Building Extraction from Aerial Images
Different Output Representations
Consistency
Overall
Distance–Mask
Mask–Boundary
Overall Training Loss Function
Datasets and Implementation Details
Evaluation Metrics
Comparison with State-of-the-Art Methods
Methods
Ablation Experiments for Inter-Task Consistency Constraints
Experiments for for
Efficiency Analysis
Qualitative Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call