Abstract

Some machine learning (ML) methods such as classification trees are useful tools to generate hypotheses about how hydrologic systems function. However, data limitations dictate that ML alone often cannot differentiate between causal and associative relationships. For example, previous ML analysis suggested that soil thickness is the key physiographic factor determining the storage-streamflow correlations in the eastern US. This conclusion is not robust, especially if data are perturbed, and there were alternative, competing explanations including soil texture and terrain slope. However, typical causal analysis based on process-based models (PBMs) is inefficient and susceptible to human bias. Here we demonstrate a more efficient and objective analysis procedure where ML is first applied to generate data-consistent hypotheses, and then a PBM is invoked to verify these hypotheses. We employed a surface-subsurface processes model and conducted perturbation experiments to implement these competing hypotheses and assess the impacts of the changes. The experimental results strongly support the soil thickness hypothesis as opposed to the terrain slope and soil texture ones, which are co-varying and coincidental factors. Thicker soil permits larger saturation excess and longer system memory that carries wet season water storage to influence dry season baseflows. We further suggest this analysis could be formulated into a data-centric Bayesian framework. This study demonstrates that PBM present indispensable value for problems that ML cannot solve alone, and is meant to encourage more synergies between ML and PBM in the future.

Highlights

  • Basin water storage has deep connections with streamflow (Reager et al, 2014; Fang and Shen, 2017)

  • After demonstrating the performance of the PAWS+Community Land Model (CLM) model for the Susquehanna River basin, we show results from the perturbation experiments

  • While soil thickness was the most frequent factor that can predict the StorageStreamflow-Correlation Spectrum (SSCS) difference between class #1 and class #3 basins (Figures 4A,B), we found that soil texture (Figures 4C,D display the result for sand percentage), and terrain slope (Figures 4E,F) are competing hypotheses

Read more

Summary

Introduction

Basin water storage has deep connections with streamflow (Reager et al, 2014; Fang and Shen, 2017). Terrestrial water storage anomalies (TWSA) data could, under certain circumstances, be used to increase flood forecast lead time (Reager et al, 2015). From a physical hydrologic point of view, more water stored in a basin could mean a higher groundwater table or wetter soils which lead to more runoff source areas (Dingman, 2015). Fang and Shen (2017) (hereafter named FS17, more description in section The Background Story) conducted an analysis of the correlation between TWSA annual extrema and different streamflow percentiles in a year, and found very interesting patterns of these correlations over the conterminous United States (CONUS). Our limited understanding of this question hampered our use of water storage and groundwater data in flood forecasting Why are there wildly different storage-streamflow relationships, i.e., what physical factors caused them? Our limited understanding of this question hampered our use of water storage and groundwater data in flood forecasting

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call