Abstract

Data-driven flood susceptibility modeling is an efficient way to map the spatial distribution of flood likelihood. The quality of the flood susceptibility model relies on the learning technique and the data used for learning. The performance of learning techniques has been extensively examined. However, to date, the impact of data sampling strategies has received limited attention. Random sampling is widely favored because of its ease of use. It treats flood-related data as tabular and excludes their spatial dimensions. Flood occurrence is typically uneven over space. Therefore, non-flood sampling should not be completely random. To represent the impact of the spatial dimension, this study proposed a new sampling approach based on spatial dependence, called inverse-occurrence sampling. It selects more non-flood data in low-risk areas than in high-risk areas. The new sampling approach was compared with random and stratified sampling, using six machine learning techniques in two urban areas in Guangzhou, China, with distinct flood mechanisms, that is, Tianhe (flood density 1.5/km2, clustered distribution, average slope 9.02°, downtown district) and Panyu (flood density 0.15/km2, random distribution, average slope 4.55°, suburban district). Learning techniques include support vector machine (SVM), random forest (RF), artificial neural networks (ANNs), convolutional neural networks (CNNs), CNN-SVM, and CNN-RF. The main findings of this study were as follows: (1) Sampling approaches had a greater impact on model performance than learning techniques in terms of area under the receiver operating characteristic curve (AUC). The AUC variations caused by learning techniques ranged from 0.04 to 0.09. Meanwhile, the AUC variations caused by sampling approaches were between 0.15 and 0.22, all larger than 0.1. (2) The new sampling approach outperformed that of the other two sampling approaches for high average AUC values and small AUC variations. The outperformance is robust in regard to multiple learning techniques and different flooding mechanisms. AUCs in the inverse group had a narrower range (0.14–0.18 in Tianhe and 0.35–0.39 in Panyu) than in the random group (0.24–0.28 in Tianhe and 0.43–0.53 in Panyu) and the stratified group (0.23–0.30 in Tianhe and 0.42–0.48 in Panyu). (3) The most accurate learning technique for AUC was CNN-RF, followed by SVM, CNN-SVM, RF, CNN, and ANN. (4) ANN- and CNN-based models tended to produce polarized patterns in flood susceptibility maps, contradicting the ascending order of flood density with increasing susceptibility levels. Flood density outliers tended to appear in the models derived using RF and CNN-RF. Finally, the newly proposed sampling approach is suggested to be applied to flood susceptibility mapping to reflect the impact of spatial dependence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call