Developing accurate models of sugarcane yield mapping at fine-scale is of paramount importance and will benefit many aspects of managing growth and harvest of sugarcane crops. Here, we combined high-spatial-resolution multi-sensor (optical and microwave) remote sensing data to estimate sugarcane yield using two approaches. First, we retrieved the 10 m resolution vegetation optical depth (VOD) using C-band Sentinel-1 synthetic aperture radar data via the water cloud model, then derived a combined VOD and Sentinel-2 derived green–red vegetation index (GRVI, a traditional vegetation index) time series. Second, we adopted the eXtreme Gradient Boost (XGBoost) machine learning algorithm to estimate total sugarcane production based on the aggregated mean monthly VOD and GRVI time series at the county scale. County-scale sugarcane yield data from 81 counties across Guangxi, China (2018–2020) were used for model training and testing. We then kept the trained best-performance model unchanged to re-predict pixel-scale (10-m resolution) yield using VOD and GRVI images. A SHapley Additive exPlanations (SHAP) approach was employed to test XGBoost knowledge regarding the mechanisms affecting yield. The main outcomes from this study were as follows. (1) The XGBoost model and VOD and GRVI time series unequivocally provided reliable and precise estimates of sugarcane yield, with a low relative error both at the county scale and at the pixel scale. (2) Predictions of sugarcane yield in the growing season using combined VOD and GRVI data yields high accuracy (R2 = 0.815) with specific spatial patterns of sugarcane yields for Guangxi province, and were feasible up to two months prior to harvest. (3) The SHAP model revealed that Sentinel-1 derived VOD data are important for predicting yield. This research portrays the benefits of combining VOD and GRVI data in automatic machine learning models for estimating sugarcane yield at high spatial and temporal resolutions.
Read full abstract