Abstract

Extracting buildings accurately from very high-resolution (VHR) remote sensing imagery is challenging due to diverse building appearances, spectral variability, and complex background in VHR remote sensing images. Recent studies mainly adopt a variant of the fully convolutional network (FCN) with an encoder–decoder architecture to extract buildings, which has shown promising improvement over conventional methods. However, FCN-based encoder–decoder models still fail to fully utilize the implicit characteristics of building shapes. This adversely affects the accurate localization of building boundaries, which is particularly relevant in building mapping. A contour-guided and local structure-aware encoder–decoder network (CGSANet) is proposed to extract buildings with more accurate boundaries. CGSANet is a multitask network composed of a contour-guided (CG) and a multiregion-guided (MRG) module. The CG module is supervised by a building contour that effectively learns building contour-related spatial features to retain the shape pattern of buildings. The MRG module is deeply supervised by four building regions that further capture multiscale and contextual features of buildings. In addition, a hybrid loss function was designed to improve the structure learning ability of CGSANet. These three improvements benefit each other synergistically to produce high-quality building extraction results. Experimental results on the WHU and NZ32km2 building datasets demonstrate that compared with the tested algorithms, CGSANet can produce more accurate building extraction results and achieve the best intersection over union value 91.55% and 90.02%, respectively. Experiments on the INRIA building dataset further demonstrate the ability for generalization of the proposed framework, indicating great practical potential.

Highlights

  • B UILDINGS are one of the main artificial objects on the earth

  • We propose a novel hybrid loss function defined as the summation of weighted binary cross-entropy (BCE), structural similarity index metric (SSIM), and weighted intersection over union (IoU) to optimize the model parameters from the position-aware pixel-level similarity, local structural similarity (SSIM), and position-aware global similarity

  • The weighting deep supervision strategy (WDS) module can increase the IoU of the model by 0.21% on the WHU dataset, and 0.17% on the NZ32KM2 dataset

Read more

Summary

Introduction

Extracting buildings automatically and accurately from remote sensing data is of great significance in cadastral mapping, disaster management, urban monitoring, and many other geospatial applications [1], [2]. With advances in remote sensor technologies, very high-resolution (VHR) remote sensing data can be acquired, making it possible to ameliorate the quality of the detected building boundaries. In practical applications, automatic and accurate building extraction from VHR remote sensing data is still challenging [3]. Developing automatic and robust methods for extracting buildings from VHR remote sensing data is a non-trivial and meaningful task in the remote sensing community. The existing building extraction methods can be roughly sorted into manually designed feature-based algorithms and deep-learning (DL) based algorithms

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call