This study introduces an efficient methodology for model stacking, incorporating six diverse machine learning and statistical models alongside principal component analysis (PCA). The approach is applied for the flash flood susceptibility mapping within the Karkheh Basin in Iran. The selected models include random forest (RF), boosted regression trees (BRT), support vector machine (SVM), artificial neural networks (ANN), generalized additive model (GAM), and the least absolute shrinkage and selection operator (Lasso), with RF also serving as the meta-model for the stacking. The results revealed significant correlations among the predictions of the individual models, which could potentially impact the meta-model’s efficacy. To address this, PCA was applied to the model predictions to generate de-correlated components as inputs for the meta-model, thereby enhancing prediction accuracy and robustness. Evaluation based on the area under the receiver operating characteristic (AUROC) curve demonstrated that the GAM outperformed all other individual models with the highest accuracy score of 0.924. In contrast, the RF and ANN models had the lowest accuracy, both registering at 0.872. However, the performance disparity across models was minimal. Notably, the PCA-based stacking approach (0.936) surpassed both traditional model stacking (0.912) and the performances of all individual models, advocating for its use in enhancing predictive accuracy. These findings endorse the PCA-stacking method over conventional stacking techniques. Nonetheless, further research across varied applications is warranted to generalize its efficacy.
Read full abstract