Timely and accurate predictions of winter wheat yields are key to ensuring food security. In this research, winter wheat yield prediction models for six provinces were established using a random forest (RF) model. Two methods were employed to analyze feature variables. RF partial dependence plots were generated to demonstrate the nonlinear relationships between the feature variables and yield, and bivariate Moran’s I was considered to identify the spatial associations between variables. Results showed that when environmental data from key growth periods were used for prediction model establishment, the root mean square error (RMSE) varied between 200 and 700 kg/ha, and the coefficient of determination (R2) exceeded 0.5. Feature variable analysis results indicated that the longitude, latitude, topography and normalized difference vegetation index (NDVI) were important variables. Below the threshold, the yield gradually increased with increasing NDVI. Bivariate Moran’s I results showed that there was zonal distribution of meteorological elements. Within a large spatial range, the change in environmental variables due to the latitude and longitude should be accounted for in modeling, but the influence of collinearity between the feature variables should be eliminated via variable importance analysis.
Read full abstract