Variability among Machine Learning Explanations for Precipitation Forecasting in K&amp;#246;ppen Climate Zones

Ali Ulvi Galip Senocak,Sinan Kalkan,M. Tugrul Yilmaz,Ismail Yucel,Muhammad Amjad

doi:10.5194/egusphere-egu24-17782

Abstract

A plethora of studies have used machine learning for quantitative precipitation forecasting. However, only a fraction of those studies have focused on the explainability of the utilized machine learning models. Consequently, to the best of the authors' knowledge, the variability in explainability concerning predictor clusters (i.e., grouped predictor categories based on shared attributes such as climate categories) has not received attention in the literature.This study aims to address this gap by analyzing variability in explanations at the model level regarding different K&#246;ppen Climate Zones (i.e., arid, temperate, and continental climates). To this end, T&#252;rkiye is selected as the study area, which has a complex topography and omnigenous in climate types. The utilized dataset covers 687 stations spanning 10 different climate zones (clustered into B, C, and D K&#246;ppen climate zones) and more than one million rows covering four years as temporal coverage. While the ground truth is defined as the daily observed precipitation amount, the predictors consist of daily total precipitation forecasts of numerical weather prediction models (ECMWF, GFS, ALARO, and WRF) with a 24-hour lead time, geographical parameters (elevation, roughness, slope, aspect, distance to the sea, latitude and longitude), and seasonality (day of the year, and month). The study uses a multi-layer perceptron (Root Mean Squared Error = 3.6 mm/day), &#160;as the machine learning method with two hidden layers (with Gaussian Error Linear Unit non-linearity). It utilizes Huber-Loss (delta = 1.5) as the loss function to mitigate the adverse effects of the long-tailed dataset. A Linear Interpretable Mogel Agnostic (LIME) approach is utilized to explain the predictions by MLP. Topographical, coordinate-based, and seasonality predictors are grouped except for the distance to the sea.The importance assessments of predictors are compared with drop-out loss, which quantifies the decline in model performance that occurs when a predictor is removed, showing the relevance of the predictors to the predictions of models. Analysis results indicate that the ECMWF forecasts are the most important predictor for the model for all three climate types, with a drop-out loss value of 0.531 for arid (B) climate zones, 1.617 for temperate (C) climate zones, and 0.901 for continental (D) climate zones. Seasonality is more utilized for generating the predictions for continental climate zones (0.05 vs 0.02 for both arid and temperate zones). Another noteworthy result is that the distance to the sea predictor negatively affects the model over arid zones (-0.03) while positively contributing to both continental (0.013) and temperate zones (0.102). Moreover, the drop-out loss for distance to the sea (0.102) exceeds the WRF forecast's (0.076) over temperate climate zones. This might be related to the average distance to the sea (0.99 degrees over temperate, 1.66 over arid, and 1.72 over continental zones). Similarly, topographical parameters have a positive effect over arid (0.003) and continental zones (0.014) while having a negative effect over temperate (-0.012) zones. These results indicate that both multi-model machine learning designs can be beneficial for complex datasets, and the influence of parameters can vary over different input clusters.

Full Text