Flooding is a very common natural hazard that causes catastrophic effects worldwide. Recently, ensemble-based techniques have become popular in flood susceptibility modelling due to their greater strength and efficiency in the prediction of flood locations. Thus, the aim of this study was to employ machine learning-based Reduced-error pruning trees (REPTree) with Bagging (Bag-REPTree) and Random subspace (RS-REPTree) ensemble frameworks for spatial prediction of flood susceptibility using a geographic information system (GIS). First, a flood spatial database was constructed with 363 flood locations and thirteen flood influencing factors, namely altitude, slope angle, slope aspect, curvature, stream power index (SPI), sediment transport index (STI), topographic wetness index (TWI), distance to rivers, normalized difference vegetation index (NDVI), soil, land use, lithology, and rainfall. Subsequently, correlation attribute evaluation (CAE) was used as the factor selection method for optimization of input factors. Finally, the receiver operating characteristic (ROC) curve, standard error (SE), confidence interval (CI) at 95%, and Wilcoxon signed-rank test were used to validate and compare the performance of the models. Results show that the RS-REPTree model has the highest prediction capability for flood susceptibility assessment, with the highest area under (the ROC) curve (AUC) value (0.949, 0.907), the smallest SE (0.011, 0.023), and the narrowest CI (95%) (0.928–0.970, 0.863–0.952) for the training and validation datasets. It was followed by the Bag-REPTree and REPTree models, respectively. The results also proved the superiority of the ensemble method over using these methods individually.
Read full abstract