Comparison of Different U-Net Models for Building Extraction from High-Resolution Aerial Imagery

Fırat Erdem,Uğur Avdan

doi:10.30897/ijegeo.684951

Fırat Erdem, Uğur Avdan

Open Access

PDF Available

https://doi.org/10.30897/ijegeo.684951

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Building extraction from high-resolution aerial imagery plays an important role in geospatial applications such as urban planning, telecommunication, disaster monitoring, navigation, updating geographic databases, and urban dynamic monitoring. Automatic building extraction is a challenging task, as the buildings in different regions have different spectral and geometric properties. Therefore, the classical image processing techniques are not sufficient for automatic building extraction from high-resolution aerial imagery applications. Deep learning and semantic segmentation models, which have gained popularity in recent years, have been used for automatic object extraction from high-resolution images. U-Net model, which was originally developed for biomedical image processing, was used for building extraction. The encoder part of the U-Net model has been modified with Vgg16, InceptionResNetV2, and DenseNet121 convolutional neural networks. Therefore, building extraction was performed using Vgg16 U-Net, InceptionResNetV2 U-Net, and DenseNet121 U-Net models. In the fourth method, the results obtained from each U-Net model were combined in order to obtain the final result by maximum voting. This study aims to compare the performance of these four methods in building extraction from high-resolution aerial imagery. Images of Chicago from the Inria Aerial Image Labeling Dataset were used in the study. The images used have 0.3 m spatial resolution, 8-bit radiometric resolution and 3-band (red, green, and blue bands). Images consist of 36 tiles and they were divided into image subsets of 512x512 pixels. Thus, a total of 2715 image subsets were formed. 80% of the image subsets (2172 image subset) were used as training and 20% (543 image subset) as testing. To evaluate the accuracy of methods, the F1 score of the building class was employed. The F1 scores for building class have been calculated as 0.866, 0.860, 0.856, and 0.877 on test images for U-Net Vgg16, U-Net InceptionResNetV2, U-Net DenseNet121, and majority voting method, respectively.

Full Text