Abstract

Detecting and localizing buildings is of primary importance in urban planning tasks. Automating the building extraction process, however, has become attractive given the dominance of Convolutional Neural Networks (CNNs) in image classification tasks. In this work, we explore the effectiveness of the CNN-based architecture U-Net and its variations, namely, the Residual U-Net, the Attention U-Net, and the Attention Residual U-Net, in automatic building extraction. We showcase their robustness in feature extraction and information processing using exclusively RGB images, as they are a low-cost alternative to multi-spectral and LiDAR ones, selected from the SpaceNet 1 dataset. The experimental results show that U-Net achieves a 91.9% accuracy, whereas introducing residual blocks, attention gates, or a combination of both improves the accuracy of the vanilla U-Net to 93.6%, 94.0%, and 93.7%, respectively. Finally, the comparison between U-Net architectures and typical deep learning approaches from the literature highlights their increased performance in accurate building localization around corners and edges.

Highlights

  • Building detection and localization are some of the most important tasks in land-cover classification [1,2,3] and urban planning [4,5,6], which derives from the fact that citizens live and interact inside buildings for most of their time

  • In this work, we explore the efficacy of the U-Net architecture along with that of its variants, namely, the Residual U-Net, the Attention U-Net, and the Attention Residual U-Net, in automatic building extraction and localization

  • Automatic building extraction from low-cost RGB images by using various deep neural network architectures based on the U-Net model was presented

Read more

Summary

Introduction

Building detection and localization are some of the most important tasks in land-cover classification [1,2,3] and urban planning [4,5,6], which derives from the fact that citizens live and interact inside buildings for most of their time. It is necessary to accurately map each building’s location during the initial urban planning procedure and, it is highly accurate with the traditional methods used, it is both time consuming and cost dependent This has motivated the research to take advantage of other available resources that can represent most of the urban scene—for instance, data from satellite and aerial images. Detailed digitization through these images manually allows the extraction of the locations of buildings in maps with reduced time and cost when compared to the traditional surveying methods while providing buildings’ precise footprints as well [7]. LiDAR images, and (3) represent information from remote sensing data using the lowest allowable image quality that captures natural urban scenes and surfaces

Related Work and Motivation for Using RGB Data
U-Net-Based Architectures
Residual Block in U-Net
Attention U-Net
Attention ResU-Net Architecture
Dataset Description
Metrics
Experimental Results
Evaluation Metrics
Model Complexity
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call