Abstract

Timely and accurate crop yield estimation is important for adjusting agronomic management and enseuring agricultural sustainability. Machine learning (ML) algorithms provide new opportunities to integrate agronomic information with ground-based and satellite data and develop flexible yield predictive models. In particular, satellite-based vegetation indices and evapotranspiration provide robust proxies for crop yield estimations in the absence of measurements; nevertheless, most prior model development efforts have focused on using only vegetation indices due to the simplicity of the process. Additionally, the contribution of input categories (i.e., field, meteorological, and satellite data) and the use of appropriate proxies, aligned with the crop growth stages, in developing yield predictive models have not been adequately investigated. To address these challenges, we employed two ML techniques, Random Forest (RF) and extreme gradient boosting algorithm (XGB), to estimate wheat yield using meteorological variables, satellite-driven actual evapotranspiration (ETa), and vegetation indices (VIs). ETa was separately computed using the surface energy balance concept and the METRIC model. The models were first trained and tested in the study area using three input combinations: i) meteorological variables, ii) satellite data, and iii) an ensemble of meteorological and satellite data. Then, the best-performing model was further evaluated using two independent datasets. We found ETa to be particularly important in improving the accuracy of the model predictions. Among the vegetation indices, EVI, EVI2, and NDVI during May, and among the meteorological data, growing degree days during the grain filling stage plus minimum temperature in the stem elongation stage had the highest contributions to yield predictions. Both ML algorithms generated relatively accurate results, where XGB was marginally more accurate than RF, considering an average mean absolute error of 0.39 t ha-1 for XGB and 0.50 t ha-1 for RF. Normalized root-mean-square errors of the ensemble, satellite-derived and meteorological-derived models in XGB were 0.05, 0.07, and 0.10, respectively. Nevertheless, both algorithms’ performances deteriorated in predicting the yield values beyond the range of the training set, though XGB could handle the extrapolation process more efficiently than RF.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call