Abstract

Abstract. Data-driven flow-forecasting models, such as artificial neural networks (ANNs), are increasingly featured in research for their potential use in operational riverine flood warning systems. However, the distributions of observed flow data are imbalanced, resulting in poor prediction accuracy on high flows in terms of both amplitude and timing error. Resampling and ensemble techniques have been shown to improve model performance on imbalanced datasets. However, the efficacy of these methods (individually or combined) has not been explicitly evaluated for improving high-flow forecasts. In this research, we systematically evaluate and compare three resampling methods, random undersampling (RUS), random oversampling (ROS), and the synthetic minority oversampling technique for regression (SMOTER), and four ensemble techniques, randomised weights and biases, bagging, adaptive boosting (AdaBoost), and least-squares boosting (LSBoost), on their ability to improve high stage prediction accuracy using ANNs. These methods are implemented both independently and in combined hybrid techniques, where the resampling methods are embedded within the ensemble methods. This systematic approach for embedding resampling methods is a novel contribution. This research presents the first analysis of the effects of combining these methods on high stage prediction accuracy. Data from two Canadian watersheds (the Bow River in Alberta and the Don River in Ontario), representing distinct hydrological systems, are used as the basis for the comparison of the methods. The models are evaluated on overall performance and on typical and high stage subsets. The results of this research indicate that resampling produces marginal improvements to high stage prediction accuracy, whereas ensemble methods produce more substantial improvements, with or without resampling. Many of the techniques used produced an asymmetric trade-off between typical and high stage performance; reduction of high stage error resulted in disproportionately larger error on a typical stage. The methods proposed in this study highlight the diversity-in-learning concept and help support future studies on adapting ensemble algorithms for resampling. This research contains many of the first instances of such methods for flow forecasting and, moreover, their efficacy in addressing the imbalance problem and heteroscedasticity, which are commonly observed in high-flow and flood-forecasting models.

Highlights

  • Data-driven models such as artificial neural networks (ANNs) have been widely and successfully used over the last 3 decades for hydrological forecasting applications (Govindaraju, 2000; Abrahart et al, 2012; Dawson and Wilby, 2001)

  • Changes to model performance are typically discussed relative to the individual learner unless explicit comparisons are specified

  • The effects that varying the HS threshold and ensemble size have on resampling and high stage performance are evaluated for the bagging and synthetic minority oversampling technique for regression (SMOTER)-Bagging models

Read more

Summary

Introduction

Data-driven models such as artificial neural networks (ANNs) have been widely and successfully used over the last 3 decades for hydrological forecasting applications (Govindaraju, 2000; Abrahart et al, 2012; Dawson and Wilby, 2001). For flow-forecasting applications such as riverine flood warning systems, the accuracy of high stage predictions is more important than that of a typical stage. One cause of poor model accuracy on a high stage is the scarcity of representative sample observations available with which to train such models (Moniz et al, 2017a). This is because stage data typically exhibit a strong positive skew, referred to as an imbalanced domain; there may only be a small number of flood observations within decades of samples.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call