Hydrological Significance of Input Sequence Lengths in LSTM-Based Streamflow Prediction

Cesar Alvarez Diaz,Farzad Hosseini Hossein Abadi,Cristina Prieto Sierra,Grey Nearing,Martin Gauch

doi:10.5194/egusphere-egu24-571

Abstract

Abstract Hydrological modeling of flashy catchments, susceptible to floods, represents a significant practical challenge. &#160;Recent application of deep learning, specifically Long Short-Term Memory networks (LSTMs), have demonstrated notable capability in delivering accurate hydrological predictions at daily and hourly time intervals (Gauch et al., 2021; Kratzert et al., 2018). In this study, we leverage a multi-timescale LSTM (MTS-LSTM (Gauch et al., 2021)) model to predict hydrographs in flashy catchments at hourly time scales. Our primary focus is to investigate the influence of model hyperparameters on the performance of regional streamflow models. We present methodological advancements using a practical application to predict streamflow in 40 catchments within the Basque Country (North of Spain). Our findings show that 1) hourly and daily streamflow predictions exhibit high accuracy, with Nash-Sutcliffe Efficiency (NSE) reaching values as high as 0.941 and 0.966 for daily and hourly data, respectively; and 2) hyperparameters associated with the length of the input sequence exert a substantial influence on the performance of a regional model. Consistently optimal regional values, following a systematic hyperparameter tuning, were identified as 3 years for daily data and 12 weeks for hourly data. Principal component analysis (PCA) shows that the first principal component explains 12.36% of the variance among the 12 hyperparameters. Within this set of hyperparameters, the input sequence lengths for hourly data exhibit the highest load in PC1, with a value of -0.523; the load of the input sequence length for daily data is also very high (-0.36). This suggests that these hyperparameters strongly contribute to the model performance. Furthermore, when utilizing a catchment-scale magnifier to determine optimal hyperparameter settings for each catchment, distinctive sequence lengths emerge for individual basins. This underscores the necessity of customizing input sequence lengths based on the &#8220;uniqueness of the place&#8221; (Beven, 2020), suggesting that each catchment may demand specific hydrologically meaningful daily and hourly input sequence lengths tailored to its unique characteristics. In essence, the true input sequence length of a catchment may encapsulate hydrological information pertaining to water transit over short and long-term periods within the basin. Notably, the regional daily sequence length aligns with the highest local daily sequence values across all catchments. In summary, our investigation stresses the critical role of the input sequence length as a hyperparameter in LSTM networks. More broadly, this work is a step towards a better understanding and achieving accurate hourly predictions using deep learning models. &#160; Keywords Hydrological modeling; Streamflow Prediction; LSTM networks; Hyperparameters configurations; Input sequence lengths &#160;

Full Text