Evaluation of re-sampling methods on performance of machine learning models to predict landslide susceptibility

Moslem Borji Hassangavyar,Hadi Eskandari Damaneh,Quoc Bao Pham,Nguyen Thi Thuy Linh,John Tiefenbacher,Quang-Vu Bach

doi:10.1080/10106049.2020.1837257

Abstract

This study tests the applicability of three resampling methods (i.e. bootstrapping, random-subsampling and cross-validation) for enhancing the performance of eight machine-learning models: boosted regression trees, flexible discriminant analysis, random forests, mixture discriminate analysis, multivariate adaptive regression splines, classification and regression trees, support vector machines and generalized linear models, compared to the use of the original data. The results of models were evaluated using correlation (COR), area under curve (AUC), true skill statistic (TSS), receiver-operating characteristic and the probability of detection (POD). The evaluation showed that the bootstrapping technique improved the performance of all models. The Bootstrapping-random forest (with COR = 0.75, AUC = 0.92, TSS = 0.80 and POD = 0.98) proved to be the best model for landslide prediction. Among the 18 contributing factors, distance from fault, curvature and precipitation were the most influential in all 32 models . Highlights Hazard prediction of landslide by the 8 machine-learning (ML) models. Multiple morphometric, climatic, geologic, vegetation and human factors were used. Tests the applicability of three resampling methods. The performance of the ML models and coupling models were assessed.

Full Text