Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States

Yuanchao Li,Xingli Qin,Miao Zhang,Hongwei Zeng,Bingfang Wu

doi:10.1080/15481603.2024.2349341

Abstract

ABSTRACT The application of machine learning in crop yield prediction has gained considerable traction, yet uncertainties persist regarding the impact of the yield trends on these predictions and the differences between the detrending methods. In our study, we utilized extreme gradient boosting (XGBoost) to scrutinize the effects of no trend processing (NTP), input year as a feature (IYF), input average yield as a feature (IAYF), input linear yield as a feature (ILYF), and the global detrending method (GDT) on the yield prediction of maize and soybean in the Midwestern United States. Based on our findings, compared with that of NTP, the incorporation of the yield trend as a predictor in XGBoost significantly improved the accuracy and reduced the uncertainty of the yield prediction. Notably, GDT emerged as a standout performer, significantly reducing the average yield prediction error by 0.091 t/ha for soybean and 0.158 t/ha for maize with respect to NTP, and concurrently improving the determination coefficient (R2) by 20.6% and 19.6% for soybean and maize, respectively. Compared with IYF, IAYF, and ILYF, GDT showed substantial improvements ranging from 3.8% to 12.7% in R2 for soybean and 3.6% to 12.7% for maize. The SHapley Additive ExPlanations (SHAP) framework showed that the enhanced vegetation index (EVI), particularly during the soybean podding and maize dough formation stages, played a crucial role in understanding the variations in interannual yield variability. These findings confirmed the importance of GDT in crop yield prediction via machine learning and could be used to facilitate future advancements in machine learning applications for yield forecasting.

Full Text