River flow forecasting is important for flood prediction and effective utilization of water resources. This study proposed a comprehensive methodology that simultaneously enables the creation of multiple datasets and a comparison of multiple learning models for forecasting river flows. First, the data characteristics of the river flow time series were assessed based on its nature and pattern characteristics. Second, data decomposition was made for simplifying the complex data into decomposed modes. Ensemble Empirical Mode Decomposition (EEMD) was used for this purpose that provides intrinsic mode functions (IMFs). In addition, as benchmark, traditional time series decomposition was also applied over the complex data. Third, based on their data characteristics, the IMFs were combined into different components. These components along with the traditional time series decomposition create multiple datasets. Then, the components from the different datasets were forecasted using deep learning models, mainly BPNN (Back-Propagation Neural Network), CNN (Convoluted Neural Network) and LSTM (Long-Short Term Memory). Then, ensemble prediction was used to get the final output. For verification and benchmark purpose, we have also used statistical models like SARIMA over the datasets. A comprehensive evaluation and model comparisons were made using Diebold-Mariano (DM) test, k-fold cross validation and with the results of Prophet. In addition, the temporal convolution network (TCN) and the gated recurrent unit (GRU) were also modelled. Finally, all the DL modelling was performed for two data situations: with missing data deleted and missing data imputed. An empirical case study was done considering the average of daily time series data of Neeleswaram Hydrological Observation (HO) station over the Periyar river of South India for the period 2017–2021. The empirical assessment was made using several tests for all the different datasets and different learning models for further insights. Our results indicate that the deep learning-based models offer better predictions for complex time series in comparison to traditional statistical models. The DL models applied on the original time series data provided robust predictions with good performance such as low RMSE and high correlation values. However, the DL models’ performances are not enhanced through integration with time series decomposition and EEMD. Furthermore, the empirical results and the k-fold cross validation-based test indicate that all the DL models show equivalent performances. Interestingly, the DL models show superior performance over Prophet.
Read full abstract