Monthly Rainfall-Runoff Modeling at Watershed Scale: A Comparative Study of Data-Driven and Theory-Driven Approaches

Won Chang,Xi Chen

doi:10.3390/w10091116

Abstract

Data-driven machine learning approaches have been rapidly developed in the past 10 to 20 years and applied to various problems in the field of hydrology. To investigate the capability of data-driven approaches in rainfall-runoff modeling in comparison to theory-driven models, we conducted a comparative study of simulated monthly surface runoff at 203 watersheds across the contiguous USA using a conceptual model, the proportionality hydrologic model, and a data-driven Gaussian process regression model. With the same input variables of precipitation and mean monthly aridity index, the two models showed similar performance. We then introduced two more input variables in the data-driven model: potential evaporation and the normalized difference vegetation index (NDVI), which were selected based on hydrologic knowledge. The modified data-driven model performed much better than either the conceptual or original data-driven model. A sensitivity analysis was conducted on all three models tested in this study, which showed that surface runoff responded positively to increased precipitation. However, a confounding effect on surface runoff sensitivity was found among mean monthly aridity index, potential evaporation, and NDVI. This confounding was caused by complex interconnections among energy supply, vegetation coverage, and climate seasonality of the watershed system. We also conducted an uncertainty analysis on the two data-driven models, which showed that both models had reasonable predictability within the 95% confidence interval. With the additional two input variables, the modified data-driven model had lower prediction uncertainty and higher prediction accuracy.

Highlights

With the rapid development of computational capability, data-driven machine learning methods have become more popular in the past decade in all fields related to data and modeling, including hydrology (e.g., [1,2,3,4,5,6,7])
We used the exceedance frequency curve of Nash–Sutcliffe efficiency (NSE) to show the overall performance of proportionality hydrologic model (PHM), Gaussian process regression (GPR), and extended Gaussian process regression (EGPR) in 203 MOPEX watersheds in the validation period (Figure 2)
This result indicates that with the same amount of input information, our data-driven model achieved a comparable level of modeling accuracy on simulated monthly surface runoff to our theory-driven conceptual hydrologic model

Summary

Introduction

With the rapid development of computational capability, data-driven machine learning methods have become more popular in the past decade in all fields related to data and modeling, including hydrology (e.g., [1,2,3,4,5,6,7]). Data-driven approaches do not rely directly on explicit physical knowledge of the target process. Instead, they build a purely empirical model based on observed relationships between input and output variables. Using various learning algorithms (e.g., [8,9,10,11,12]), data-driven approaches provide a flexible way to model complex phenomena such as runoff generation. Sun et al [13]

Objectives

Results

Discussion

Conclusion