Highlights Random forests equipped with recursive feature elimination were used to predict crop yield variation using diverse predictors. The predictors belonged to either environmental, soils, or management domains. The best performing models combined predictors from all three domains and explained 75%-80% spatiotemporal yield variance. Extreme heat was most important among soils, management, and seasonal environment predictors in maize and soybean. Irrigation intensity, tile drainage, % silt, and July rain were most important at the sub-seasonal aggregation scale. Abstract. Crop yields are dictated by a complex interplay between environment, edaphic (soils), and management that are subject to change across space and time. However, to what extent each of these influences and their interactions have been important in explaining yield variance is limitedly understood. The convoluted nature of this question motivates the application of modern machine learning approaches to decipher these influences and elucidate crop yield relations with their critical drivers. Here, we used random forest modeling with recursive feature elimination (RFE) to discern the diverse drivers of historical (1981-2019) county-level maize and soybean yields in the U.S. within the realms of environment (growing and extreme degree days, precipitation, vapor pressure deficit, evaporative demand, crop water use, soil moisture), soils (sand, silt, and clay contents, bulk density, soil organic carbon, available water capacity), and management (irrigation intensity, tile drainage, nitrogen input, depth to groundwater). We found that the most effective models selected predictors from all three realms and achieved remarkable explanatory capability, accounting for 75%-80% of the spatial and temporal variance combined. Environmental predictors exhibited a non-negotiable role in determining model performance, while their combination with either soil or management predictors approached the efficacy of the best-performing model. Specific variables within each individual predictor set and their combinations were analyzed for their relative importance to the model skill. The best performing model that used soils, management, and seasonal environmental predictors evaluated extreme heat as the most influential predictor for both crops. On further inclusion of sub-seasonal environmental variables with soil and management predictors, the relative importance shifted, with irrigation intensity assuming prominence as the most influential predictor, accompanied by tile drainage, silt content, and July precipitation for both crops. RFE revealed that only eight of the most relevant predictors were sufficient to explain >70% of the yield variance, even when the total predictors included were as high as 52. that Visual representations were developed to offer insights into the functional response of crop yield to changes in critical predictors, thereby facilitating predictions across diverse agricultural production systems and over time. This research underscores the value of including soil and management indicators alongside environmental predictors to improve understanding and predictability of the intricate dynamics governing crop yield variability. Keywords: Climate variability, Drainage, Irrigation, Maize, Nitrogen, Random forest, Soils, Soybean.
Read full abstract