Effectiveness of Semi-Supervised Learning and Multi-Source Data in Detailed Urban Landuse Mapping with a Few Labeled Samples

Yang Zhang,Bo Sun,Qiming Zhou,Xinchang Zhang

doi:10.3390/rs14030648

Abstract

Detailed urban landuse information plays a fundamental role in smart city management. A sufficient sample size has been identified as a very crucial pre-request in machine learning algorithms for urban landuse classification. However, it is often difficult to recognize and label landuse categories from remote sensing images alone. Alternatively, field investigation is time-consuming with a high demand in human resources and monetary cost. Therefore, previous studies on urban landuse classification have often relied on a small size of labeled samples with very uneven spatial distribution. This study aims to explore the effectiveness of a semi-supervised classification framework with multi-source data for detailed urban landuse classification with a few labeled samples. A disagreement-based semi-supervised learning approach, the Co-Forest, was employed and compared with traditional supervised methods (e.g., random forest and XGBoost). Multi-source geospatial data were utilized including optical and nighttime light remote sensing and geospatial big data, which present the physical and socio-economic features of landuse categories. Taking urban landuse classification in Shenzhen City as a case, results show that the classification accuracy of the semi-supervised method are generally on par with that of traditional supervised methods, and less labeled samples are needed to achieve a comparable result under different training set ratios. Given a small sample size, the accuracy tends to be stable with training samples no less than 5% in total. Our results also indicate that the classification accuracy by using multi-source data is significantly higher than that with any single data source being applied. Among these data, map POI and high-resolution optical remote sensing data make larger contributions on the classification, followed by mobile data and nighttime light remote sensing data.

Highlights

The results prove that the semi-supervised method performed better than the supervised classifiers, and it effectively reduced the demand of labeled samples for model training without reducing the classification accuracy
We explored the effectiveness of the semi-supervised Co-Forest algorithm and multi-source geospatial data in detailed urban landuse classification with a small sample size
By taking Shenzhen City as a case, the semi-supervised Co-Forest method showed a comparable result with the traditional supervised classifiers such as random forests (RF) and XGBoost with a lower training set ratio level

Summary

Introduction

Up-to-date urban landuse map is in high demand in the management of a smart society. Remote sensing technology, providing the ability of wide-range observation and rapid response to change, has been widely used in many studies on urban landuse and land cover classification [1,2,3,4]. Traditional urban landuse classification techniques are based on multi-spectral remote sensing images. In addition to the spectral features, geometric and texture features are employed to obtain a more accurate classification as the spatial resolution of remote sensing imagery has improved [5,6,7].

Objectives

Methods

Discussion

Conclusion