Abstract

Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks—FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.

Highlights

  • One of the fundamental tasks in remote sensing is building extraction from remote sensing imagery

  • Semantic segmentation as an effective technique aims to assign each pixel in the target image into a given category [5]; it was quickly developed and extensively applied to urban planning and relevant studies including building/road detection [6,7,8], land use/cover mapping [9,10,11,12], and forest management [13,14] with the emergence of a large number of publicly available very high resolution (VHR) images

  • The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process, and a softmax classifier was added at the end of the proposed network to obtain the final segmentation results

Read more

Summary

Introduction

One of the fundamental tasks in remote sensing is building extraction from remote sensing imagery It plays a key role in applications such as urban construction and planning, natural disaster and crisis management [1,2,3]. Semantic segmentation as an effective technique aims to assign each pixel in the target image into a given category [5]; it was quickly developed and extensively applied to urban planning and relevant studies including building/road detection [6,7,8], land use/cover mapping [9,10,11,12], and forest management [13,14] with the emergence of a large number of publicly available VHR images. As pointed by Ball et al [15], traditional feature learning approaches can work quite well, but several issues remain in the applications of these techniques and constrain their wide applicability

Objectives
Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.