Abstract

Deep learning approaches have been widely applied to building footprint extraction using high-resolution imagery. However, the traditional fully convolution network still has problems in recovering spatial details and discriminating buildings with varying sizes and styles. We propose a novel multipath hybrid attention network (MHA-Net) to address these challenges. We design a separable convolution block attention module and an attention downsampling module as the basic modules with separable convolutions and channel attention. The MHA-Net architecture consists of three components: the encoding network, multipath hybrid dilated convolution (HDC), and dense upsampling convolution (DUC). The encoding network is used to encode the high-level semantic contexts of images. The multipath HDC aggregates multiscale features by combining rich semantic representations extracted by HDCs, which can achieve promising results in extracting tiny buildings. The DUC is capable of recovering precise spatial information of buildings. We evaluate our network on two public datasets: the WHU aerial building dataset and the Massachusetts building dataset. According to the experimental results, MHA-Net outperforms other classical semantic segmentation models and several recent building extraction models. In particular, MHA-Net can improve the extraction accuracy of small buildings and is robust to complicated building roofs.

Highlights

  • A S THE fundamental entities in urban systems, buildings are the primary carriers of human production and life

  • According to the literature mentioned above, the major challenge of building extraction is to recover spatial detail and improve the discrimination of buildings with varying sizes and styles. Aiming at addressing this challenge and improving extraction accuracy, we propose a multipath hybrid attention network (MHA-Net) for automatical building footprint extraction

  • The results show that building extraction models outperform the classical deep learning models on this dataset

Read more

Summary

INTRODUCTION

A S THE fundamental entities in urban systems, buildings are the primary carriers of human production and life. CNNs can automatically learn rich image features without prior knowledge via deep convolutional architectures They have been widely used in remote sensing areas for object detection [18], hyperspectral image classification [19], and scene classification [20], [21]. Sun et al [45] proposed a conditional GIS-aware network that employs complementary information from GIS data to extract building footprints from a very-high-resolution synthetic aperture radar image. According to the literature mentioned above, the major challenge of building extraction is to recover spatial detail and improve the discrimination of buildings with varying sizes and styles Aiming at addressing this challenge and improving extraction accuracy, we propose a multipath hybrid attention network (MHA-Net) for automatical building footprint extraction. 1) An effective semantic segmentation model, MHA-Net, is proposed for building footprint extraction.

Architecture Overview
Encoding Network
Multipath Hybrid Dilated Convolution
EXPERIMENTAL RESULTS
Dense Upsampling Convolution
Dataset
Implementation Setting
Selected Models for Comparison
Experimental Results Using the WHU Aerial Building Dataset
Experimental Results Using the Massachusetts Building Dataset
Ablation Study
Experimental Results Using Different Dilation Rates
Complexity of MHA-Net
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call