Monitoring sugarcane areas through remote sensing is essential for planning and management of the national sugarcane industry. The use of machine learning algorithms has provided many benefits to remote sensing. This article aims to compare the prediction quality of three important machine learning methods in identifying sugarcane areas using Landsat images: Logistic Regression (LR), Decision Tree (DT) and RandomForest (RF). LR was applied in three versions: LR without penalization, LR with Ridge penalization (LR-R) and LR with Lasso penalization (LR-L). Data obtained in this study refer to a region of approximately 306,000 ha located in the state of São Paulo, Brazil, which was segmented into approximately 46,000 polygons (observations). Six spectral bands and vegetation indices observed along 17 months resulted in 102 covariates, which were reduced via Principal Component Analysis (PCA). In total, 19 Principal Components were chosen to account for 94.61% of the cumulative explained variance ratio and were used in machine learning methods to classify each polygon as sugarcane or other land covers. The method with the highest accuracy considering testing sample of 20% of data was RF (78.51%), followed by DT (72.30%), LR-L (69.64%), LR-R (69.64%), and LR (69.52%).
Read full abstract