
Abstract. Design flood estimation is a fundamental task in hydrology. In this research, we propose a machine-learning-based approach to estimate design floods globally. This approach involves three stages: (i) estimating at-site flood frequency curves for global gauging stations using the Anderson–Darling test and a Bayesian Markov chain Monte Carlo (MCMC) method; (ii) clustering these stations into subgroups using a K-means model based on 12 globally available catchment descriptors; and (iii) developing a regression model in each subgroup for regional design flood estimation using the same descriptors. A total of 11 793 stations globally were selected for model development, and three widely used regression models were compared for design flood estimation. The results showed that (1) the proposed approach achieved the highest accuracy for design flood estimation when using all 12 descriptors for clustering; and the performance of the regression was improved by considering more descriptors during training and validation; (2) a support vector machine regression provided the highest prediction performance amongst all regression models tested, with a root mean square normalised error of 0.708 for 100-year return period flood estimation; (3) 100-year design floods in tropical, arid, temperate, cold and polar climate zones could be reliably estimated (i.e. <±25 % error), with relative mean bias (RBIAS) values of −0.199, −0.233, −0.169, 0.179 and −0.091 respectively; (4) the machine-learning-based approach developed in this paper showed considerable improvement over the index-flood-based method introduced by Smith et al. (2015, https://doi.org/10.1002/2014WR015814) for design flood estimation at global scales; and (5) the average RBIAS in estimation is less than 18 % for 10-, 20-, 50- and 100-year design floods. We conclude that the proposed approach is a valid method to estimate design floods anywhere on the global river network, improving our prediction of the flood hazard, especially in ungauged areas.


  • Flood hazard is the primary weather-related disaster worldwide, affecting 2.3 billion people and causing USD 662 billion in economic damage between 1995 and 2015 (CRED and UNISDR, 2015)

  • The MK, standard normal homogeneity test (SNHT) and AD tests were applied to these stations

  • The selected stations were clustered into several subgroups based on a K-means clustering model, and the discordancy measures described in Sect. 3.1.2 were further applied to these stations in each subgroup

Read more



Flood hazard is the primary weather-related disaster worldwide, affecting 2.3 billion people and causing USD 662 billion in economic damage between 1995 and 2015 (CRED and UNISDR, 2015). Flood hazard models are mature tools to identify flood-prone areas and have been widely used in flood risk management at catchment or regional scales (Hammond et al, 2015; Teng et al, 2017). With the development of new remote sensing techniques and an increase in computing power, global flood hazard models (GFHMs) are a practical reality and have been successfully applied for largescale flood mapping and validated in several countries (Bates et al, 2020; Schumann et al, 2018). GFHMs can identify flood-prone areas in ungauged basins and provide a consistent and comprehensive understanding of the flood hazard at national, continental and global scales.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call