Challenges in Applying Machine Learning Models for Hydrological Inference: A Case Study for Flooding Events Across Germany

Lennart Schmidt,Sabine Attinger,Rohini Kumar,Falk Heße

doi:10.1029/2019wr025924

Lennart Schmidt, Sabine Attinger + Show 2 more

Open Access

PDF Available

https://doi.org/10.1029/2019wr025924

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractMachine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever‐increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model‐agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross‐validation routine.

Highlights

The rapid progress made in the field of machine learning (ML) is arguably the most relevant current development for the field of hydrology
To account for the observed problems, we propose that when employing Machine learning (ML) for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine
While the training for random forest (RF) and artificial neural network (ANN) was based on avalidation procedure, the linear model (LM) calibration was based on the training data set, only

Summary

Introduction

The rapid progress made in the field of machine learning (ML) is arguably the most relevant current development for the field of hydrology. Examples are forecasting of urban water demand (Herrera et al, 2010), estimation of flow duration at ungauged sites (Booker & Snelder, 2012), streamflow classification (Peñas et al, 2014), and simulation (Gudmundsson & Seneviratne, 2015; Shortridge et al, 2016). Regarding the latter, Kratzert et al (2019) have recently demonstrated the high potential of ML models for rainfall-runoff modeling, even when applied to ungauged basins

Methods

Results

Conclusion