Machine learning (ML) is increasingly perceived as a futuristic, superior data-driven approach to scientific discovery. It has already demonstrated remarkable performance in forecasting and prediction, yet its role in enhancing our understanding of hydrological processes remains underexplored. Traditional hydrological interpretations have relied heavily on model-dependent interpretation methods, focusing on the predictive accuracy of ML model predictions. Since hydrological models are built on a collection of assumptions and simplifications, model-dependent approaches might suffer from limited model realism, adequacy, accuracy, and equifinality issues. To address this gap, this study provides an ML approach that works in a model-independent context, working directly on hydroclimatic data collected through monitoring systems.We apply our model-independent interpretation approach to a carefully designed set of hydrologic data collected across the contiguous United States to address the following questions: (1) What are the primary controls of runoff-generation mechanisms, and how can such controls be attributed to catchment properties? (2) How and under what circumstances can the history of climate variables, such as precipitation, be a surrogate for present-time state variables, such as soil moisture and snowpack? We show that the ML approach aids in distinguishing catchments characterized by strong overland flow, interflow, or baseflow components and those primarily driven by rainfall, snowmelt, or a mix thereof. We further show that typical surrogate variables used in hydrology may come short in representing the dynamics of catchments that exhibit a complex interplay of rain and snow.
Read full abstract