A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images

Duanguang Cao,Mei-Po Kwan,Hanfa Xing,Yuan Meng,Huaqiao Xing,Man Sing Wong

doi:10.3390/rs13193898

Abstract

Automatically extracting buildings from remote sensing images with deep learning is of great significance to urban planning, disaster prevention, change detection, and other applications. Various deep learning models have been proposed to extract building information, showing both strengths and weaknesses in capturing the complex spectral and spatial characteristics of buildings in remote sensing images. To integrate the strengths of individual models and obtain fine-scale spatial and spectral building information, this study proposed a stacking ensemble deep learning model. First, an optimization method for the prediction results of the basic model is proposed based on fully connected conditional random fields (CRFs). On this basis, a stacking ensemble model (SENet) based on a sparse autoencoder integrating U-NET, SegNet, and FCN-8s models is proposed to combine the features of the optimized basic model prediction results. Utilizing several cities in Hebei Province, China as a case study, a building dataset containing attribute labels is established to assess the performance of the proposed model. The proposed SENet is compared with three individual models (U-NET, SegNet and FCN-8s), and the results show that the accuracy of SENet is 0.954, approximately 6.7%, 6.1%, and 9.8% higher than U-NET, SegNet, and FCN-8s models, respectively. The identification of building features, including colors, sizes, shapes, and shadows, is also evaluated, showing that the accuracy, recall, F1 score, and intersection over union (IoU) of the SENet model are higher than those of the three individual models. This suggests that the proposed ensemble model can effectively depict the different features of buildings and provides an alternative approach to building extraction with higher accuracy.

Highlights

Introduction conditions of the Creative CommonsAutomatic extraction of buildings from remote sensing imagery is of great significance for many applications, such as urban planning, environmental research, change detection and digital city construction [1,2,3]
Spatial and textural features of images are extracted through mathematical descriptors, such as the scale-invariant feature transform (SIFT) [6], local binary patterns (LBPs) [7], and histograms of oriented gradients (HOGs) [8]
To integrate the feature advantages of different deep learning models and improve the robustness and stability of the model, a deep learning feature integration method based on a stacking ensemble technique was proposed for extracting buildings from remote sensing images

Summary

Introduction

Introduction conditions of the Creative CommonsAutomatic extraction of buildings from remote sensing imagery is of great significance for many applications, such as urban planning, environmental research, change detection and digital city construction [1,2,3]. The high spatial resolution of remote sensing imagery reveals fine details in urban areas and greatly facilitates automatic building extraction. Pixel-by-pixel predictions were introduced based on extracted features through classifiers, such as support vector machines (SVMs) [9], adaptive boosting (AdaBoost) [10], random forests [11], and conditional random fields (CRFs) [12]. These methods rely heavily on manual feature designs and implementations, which generally change with the application area

Results

Discussion

Conclusion