Abstract. This article aimed to determine a workflow for more efficient large-scale crop mapping using a time series of images from the Sentinel-2 Satellite, statistical methods of attribute selection, and machine learning. The proposed methodology explores the best possible combination of spectral variables related to vegetation (16 vegetation indices in the RGB, NIR, SWIR, and Red Edge regions) to characterize different spectro-temporal profiles of Land Use and Land Cover (LULC) in spatially heterogeneous landscapes. First, we applied a data dimensionality reduction analysis using the PCA (Principal Component Analysis) method. Subsequently, the variables that showed the highest statistical correlation between each other were used in the spectro-temporal classification process, using the Random Forest, TempCNN, and LightTAE algorithms, following three different strategies: C1 (ALL), C2 (BE + IV (Red Edge)) and C3 (BE + IV (without Red Edge)), where ALL – All variables; BE – Spectral Bands; IV – Vegetation Indices. Given the results found, the C2 classification scenario (with bands B3, B4, B5, B6, B7, B8, and B8A and the NDRE1, RESI, and MSR indexes) demonstrated the best LULC classification accuracy at the crop pattern level, compared to the other scenarios, with average values of 0.91, 0.88, 0.91, 0.89, and 0.89 (Global Accuracy, Producer Accuracy, User Accuracy, Kappa index, and F1-Score, respectively, for the TempCNN model), the which emphasized the importance of both qualitative and quantitative variability of sampling data and variables based on the Red Edge region for improving LULC classification processes in large-scale heterogeneous landscapes.
Read full abstract