Abstract

A barrier to utilizing machine learning in seasonal forecasting applications is the limited sample size of observational data for model training. To circumvent this issue, here we explore the feasibility of training various machine learning approaches on a large climate model ensemble, providing a long training set with physically consistent model realizations. After training on thousands of seasons of climate model simulations, the machine learning models are tested for producing seasonal forecasts across the historical observational period (1980-2020). For forecasting large-scale spatial patterns of precipitation across the western United States, here we show that these machine learning-based models are capable of competing with or outperforming existing dynamical models from the North American Multi Model Ensemble. We further show that this approach need not be considered a ‘black box’ by utilizing machine learning interpretability methods to identify the relevant physical processes that lead to prediction skill.

Highlights

  • Two parameters were tuned in the Random Forests (RF) model: the number of trees set to 5000, and the number of variables randomly sampled at each split set to 10

  • Summary This study has tested a novel approach for seasonal forecasting western US precipitation

  • A range of machine learning approaches have been trained on large climate model simulations, and their predictions combined in an ensemble to predict large-scale patterns of precipitation anomalies

Read more

Summary

Introduction

Two parameters were tuned in the RF model: the number of trees set to 5000, and the number of variables randomly sampled at each split set to 10. These parameter choices were based on tuning across the CESM-LENS training dataset, though sensitivity testing revealed stable results across a number of parameter choices provided that the number of trees was sufficiently large. S9 and 10) and through sensitivity testing to the of out-of-bag accuracy across the CESM-LENS training to how each variable was lagged. The first EOF of each SST variable/region was lagged at 1-month intervals up to 12 months, and the second EOFs were lagged up to 6 months. All other predictor variables were not lagged, such that only October values were used to make the NDJ predictions and only December values were used to make the JFM prediction

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call