Abstract
Detecting and localizing buildings is of primary importance in urban planning tasks. Automating the building extraction process, however, has become attractive given the dominance of Convolutional Neural Networks (CNNs) in image classification tasks. In this work, we explore the effectiveness of the CNN-based architecture U-Net and its variations, namely, the Residual U-Net, the Attention U-Net, and the Attention Residual U-Net, in automatic building extraction. We showcase their robustness in feature extraction and information processing using exclusively RGB images, as they are a low-cost alternative to multi-spectral and LiDAR ones, selected from the SpaceNet 1 dataset. The experimental results show that U-Net achieves a 91.9% accuracy, whereas introducing residual blocks, attention gates, or a combination of both improves the accuracy of the vanilla U-Net to 93.6%, 94.0%, and 93.7%, respectively. Finally, the comparison between U-Net architectures and typical deep learning approaches from the literature highlights their increased performance in accurate building localization around corners and edges.
Highlights
Building detection and localization are some of the most important tasks in land-cover classification [1,2,3] and urban planning [4,5,6], which derives from the fact that citizens live and interact inside buildings for most of their time
In this work, we explore the efficacy of the U-Net architecture along with that of its variants, namely, the Residual U-Net, the Attention U-Net, and the Attention Residual U-Net, in automatic building extraction and localization
Automatic building extraction from low-cost RGB images by using various deep neural network architectures based on the U-Net model was presented
Summary
Building detection and localization are some of the most important tasks in land-cover classification [1,2,3] and urban planning [4,5,6], which derives from the fact that citizens live and interact inside buildings for most of their time. It is necessary to accurately map each building’s location during the initial urban planning procedure and, it is highly accurate with the traditional methods used, it is both time consuming and cost dependent This has motivated the research to take advantage of other available resources that can represent most of the urban scene—for instance, data from satellite and aerial images. Detailed digitization through these images manually allows the extraction of the locations of buildings in maps with reduced time and cost when compared to the traditional surveying methods while providing buildings’ precise footprints as well [7]. LiDAR images, and (3) represent information from remote sensing data using the lowest allowable image quality that captures natural urban scenes and surfaces
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have