Abstract
Deep learning approaches have been widely applied to building footprint extraction using high-resolution imagery. However, the traditional fully convolution network still has problems in recovering spatial details and discriminating buildings with varying sizes and styles. We propose a novel multipath hybrid attention network (MHA-Net) to address these challenges. We design a separable convolution block attention module and an attention downsampling module as the basic modules with separable convolutions and channel attention. The MHA-Net architecture consists of three components: the encoding network, multipath hybrid dilated convolution (HDC), and dense upsampling convolution (DUC). The encoding network is used to encode the high-level semantic contexts of images. The multipath HDC aggregates multiscale features by combining rich semantic representations extracted by HDCs, which can achieve promising results in extracting tiny buildings. The DUC is capable of recovering precise spatial information of buildings. We evaluate our network on two public datasets: the WHU aerial building dataset and the Massachusetts building dataset. According to the experimental results, MHA-Net outperforms other classical semantic segmentation models and several recent building extraction models. In particular, MHA-Net can improve the extraction accuracy of small buildings and is robust to complicated building roofs.
Highlights
A S THE fundamental entities in urban systems, buildings are the primary carriers of human production and life
According to the literature mentioned above, the major challenge of building extraction is to recover spatial detail and improve the discrimination of buildings with varying sizes and styles. Aiming at addressing this challenge and improving extraction accuracy, we propose a multipath hybrid attention network (MHA-Net) for automatical building footprint extraction
The results show that building extraction models outperform the classical deep learning models on this dataset
Summary
A S THE fundamental entities in urban systems, buildings are the primary carriers of human production and life. CNNs can automatically learn rich image features without prior knowledge via deep convolutional architectures They have been widely used in remote sensing areas for object detection [18], hyperspectral image classification [19], and scene classification [20], [21]. Sun et al [45] proposed a conditional GIS-aware network that employs complementary information from GIS data to extract building footprints from a very-high-resolution synthetic aperture radar image. According to the literature mentioned above, the major challenge of building extraction is to recover spatial detail and improve the discrimination of buildings with varying sizes and styles Aiming at addressing this challenge and improving extraction accuracy, we propose a multipath hybrid attention network (MHA-Net) for automatical building footprint extraction. 1) An effective semantic segmentation model, MHA-Net, is proposed for building footprint extraction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have