Accurate yield prediction is essential for global food security and effective agricultural management. Traditional empirical statistical models and crop models face significant limitations, including high computational demands and dependency on high-resolution soil and daily weather data, that restrict their scalability across different temporal and spatial scales. Moreover, the lack of sufficient observational data further hinders the broad application of these methods. In this study, building on the SCYM method, we propose an integrated framework that combines crop models and machine learning techniques to optimize crop yield modeling methods and the selection of vegetation indices. We evaluated three commonly used vegetation indices and three widely applied ML techniques. Additionally, we assessed the impact of combining meteorological and phenological variables on yield estimation accuracy. The results indicated that the green chlorophyll vegetation index (GCVI) outperformed the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) in linear models, achieving an R2 of 0.31 and an RMSE of 396 kg/ha. Non-linear ML methods, particularly LightGBM, demonstrated superior performance, with an R2 of 0.42 and RMSE of 365 kg/ha for GCVI. The combination of GCVI with meteorological and phenological data provided the best results, with an R2 of 0.60 and an RMSE of 295 kg/ha. Our proposed framework significantly enhances the accuracy and efficiency of winter wheat yield estimation, supporting more effective agricultural management and policymaking.
Read full abstract