Abstract Soy harvest failure events can severely impact farmers, insurance companies, and raise global prices. Reliable seasonal forecasts of misharvests would allow stakeholders to prepare and take appropriate early action. However, especially for farmers, the reliability and lead time of current prediction systems provide insufficient information to justify within-season adaptation measures. Recent innovations increased our ability to generate reliable statistical seasonal forecasts. Here, we combine these innovations to predict the 1–3 poor soy harvest years in the eastern United States. We first use a clustering algorithm to spatially aggregate crop producing regions within the eastern United States that are particularly sensitive to hot–dry weather conditions. Next, we use observational climate variables [sea surface temperature (SST) and soil moisture] to extract precursor time series at multiple lags. This allows the machine learning model to learn the low-frequency evolution, which carries important information for predictability. A selection based on causal inference allows for physically interpretable precursors. We show that the robust selected predictors are associated with the evolution of the horseshoe Pacific SST pattern, in line with previous research. We use the state of the horseshoe Pacific to identify years with enhanced predictability. We achieve high forecast skill of poor harvests events, even 3 months prior to sowing, using a strict one-step-ahead train-test splitting. Over the last 25 years, when the horseshoe Pacific SST pattern was anomalously strong, 67% of the poor harvests predicted in February were correct. When operational, this forecast would enable farmers to make informed decisions on adaption measures, for example, selecting more drought-resistant cultivars or change planting management. Significance Statement If soy farmers would know that the upcoming growing season will be hot and dry, they could decide to take anticipatory action to reduce losses, that is, buy more drought resistant soy cultivars or change planting management. To make such decisions, farmers would need information even prior to sowing. On these very long lead times, a predictable signal can emerge from low-frequency processes of the climate system that can affect surface weather via teleconnections. However, traditional forecast systems are unable to make reliable predictions at these lead times. In this work, we used machine learning techniques to train a forecast model based on these low-frequency components. This allowed us to make reliable predictions of poor harvest years even 3 months prior to sowing.
Read full abstract