In the United States, several land use and land cover (LULC) data sets are available based on satellite data, but these data sets often fail to accurately represent features on the ground. Alternatively, detailed mapping of heterogeneous landscapes for informed decision-making is possible using high spatial resolution orthoimagery from the National Agricultural Imagery Program (NAIP). However, large-area mapping at this resolution remains challenging due to radiometric differences among scenes, landscape heterogeneity, and computational limitations. Various machine learning (ML) techniques have shown promise in improving LULC maps. The primary purposes of this study were to evaluate bagging (Random Forest, RF), boosting (Gradient Boosting Machines [GBM] and extreme gradient boosting [XGB]), and stacking ensemble ML models. We used these techniques on a time series of Sentinel 2A data and NAIP orthoimagery to create a LULC map of a portion of Irion and Tom Green counties in Texas (USA). We created several spectral indices, structural variables, and geometry-based variables, reducing the dimensionality of features generated on Sentinel and NAIP data. We then compared accuracy based on random cross-validation without accounting for spatial autocorrelation and target-oriented cross-validation accounting for spatial structures of the training data set. Comparison of random and target-oriented cross-validation results showed that autocorrelation in the training data offered overestimation ranging from 2% to 3.5%. The XGB-boosted stacking ensemble on-base learners (RF, XGB, and GBM) improved model performance over individual base learners. We show that meta-learners are just as sensitive to overfitting as base models, as these algorithms are not designed to account for spatial information. Finally, we show that the fusion of Sentinel 2A data with NAIP data improves land use/land cover classification using geographic object-based image analysis.
Read full abstract