Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy

Everett Snieder,Karen Abogadil,Usman T Khan

doi:10.5194/hess-25-2543-2021

Everett Snieder, Karen Abogadil + Show 1 more

Open Access

https://doi.org/10.5194/hess-25-2543-2021

Copy DOI

Journal: Hydrology and Earth System Sciences	Publication Date: May 18, 2021
Citations: 19	License type: CC BY 4.0

Affiliation: York University

Abstract

Abstract. Data-driven flow-forecasting models, such as artificial neural networks (ANNs), are increasingly featured in research for their potential use in operational riverine flood warning systems. However, the distributions of observed flow data are imbalanced, resulting in poor prediction accuracy on high flows in terms of both amplitude and timing error. Resampling and ensemble techniques have been shown to improve model performance on imbalanced datasets. However, the efficacy of these methods (individually or combined) has not been explicitly evaluated for improving high-flow forecasts. In this research, we systematically evaluate and compare three resampling methods, random undersampling (RUS), random oversampling (ROS), and the synthetic minority oversampling technique for regression (SMOTER), and four ensemble techniques, randomised weights and biases, bagging, adaptive boosting (AdaBoost), and least-squares boosting (LSBoost), on their ability to improve high stage prediction accuracy using ANNs. These methods are implemented both independently and in combined hybrid techniques, where the resampling methods are embedded within the ensemble methods. This systematic approach for embedding resampling methods is a novel contribution. This research presents the first analysis of the effects of combining these methods on high stage prediction accuracy. Data from two Canadian watersheds (the Bow River in Alberta and the Don River in Ontario), representing distinct hydrological systems, are used as the basis for the comparison of the methods. The models are evaluated on overall performance and on typical and high stage subsets. The results of this research indicate that resampling produces marginal improvements to high stage prediction accuracy, whereas ensemble methods produce more substantial improvements, with or without resampling. Many of the techniques used produced an asymmetric trade-off between typical and high stage performance; reduction of high stage error resulted in disproportionately larger error on a typical stage. The methods proposed in this study highlight the diversity-in-learning concept and help support future studies on adapting ensemble algorithms for resampling. This research contains many of the first instances of such methods for flow forecasting and, moreover, their efficacy in addressing the imbalance problem and heteroscedasticity, which are commonly observed in high-flow and flood-forecasting models.

Highlights

Data-driven models such as artificial neural networks (ANNs) have been widely and successfully used over the last 3 decades for hydrological forecasting applications (Govindaraju, 2000; Abrahart et al, 2012; Dawson and Wilby, 2001)
Changes to model performance are typically discussed relative to the individual learner unless explicit comparisons are specified
The effects that varying the HS threshold and ensemble size have on resampling and high stage performance are evaluated for the bagging and synthetic minority oversampling technique for regression (SMOTER)-Bagging models

Summary

Introduction

Data-driven models such as artificial neural networks (ANNs) have been widely and successfully used over the last 3 decades for hydrological forecasting applications (Govindaraju, 2000; Abrahart et al, 2012; Dawson and Wilby, 2001). For flow-forecasting applications such as riverine flood warning systems, the accuracy of high stage predictions is more important than that of a typical stage. One cause of poor model accuracy on a high stage is the scarcity of representative sample observations available with which to train such models (Moniz et al, 2017a). This is because stage data typically exhibit a strong positive skew, referred to as an imbalanced domain; there may only be a small number of flood observations within decades of samples.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Hydrology and Earth System Sciences

Lead the way for us

Similar Papers

Machine Learning Models for Classifying Imbalanced Class Datasets Using Ensemble Learning
Aditya Yulis Kusdiyanto ... Yoga Pristyanto
-
Aditya Yulis Kusdiyanto, et. al.Aditya Yulis Kusdiyanto ... Yoga Pristyanto
08 Dec 2022
08 Dec 2022

Model ensemble techniques of machine learning algorithms for soil moisture constants in the semi‐arid climate conditions
Pelin Alaboz
Irrigation and Drainage | VOL. -
Pelin AlabozPelin Alaboz
26 Sep 2024
Irrigation and Drainage | VOL. -

Machine learning models to quantify the influence of PM10 aerosol concentration on global solar radiation prediction in South Africa
Tamara Rosemary Govindasamy ... Naven Chetty
Cleaner Engineering and Technology | VOL. 2
Tamara Rosemary Govindasamy, et. al.Tamara Rosemary Govindasamy ... Naven Chetty
01 Jun 2021
Cleaner Engineering and Technology | VOL. 2

Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China
Xudong Hu ... Yuanyuan Li
Natural Hazards | VOL. 105
Xudong Hu, et. al.Xudong Hu ... Yuanyuan Li
17 Oct 2020
Natural Hazards | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Hydrology and Earth System Sciences