Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling

Wei Chen,Shuai Zhang,Renwei Li,Himan Shahabi

doi:10.1016/j.scitotenv.2018.06.389

Abstract

The main aim of the present study is to explore and compare three state-of-the art data mining techniques, best-first decision tree, random forest, and naïve Bayes tree, for landslide susceptibility assessment in the Longhai area of China. First, a landslide inventory map with 93 landslide locations was randomly divided, with 70% of the area used for training landslide models and 30% used for the validation process. A spatial database of 14 conditioning factors was constructed under a geographic information system environment. Subsequently, the ReliefF method was employed to assess the prediction capability of the conditioning factors in landslide models. Multicollinearity of these factors was verified using the variance inflation factor, tolerance, and Pearson's correlation coefficient. Finally, the three resulting models were evaluated and compared using the area under the receiver operating characteristic (AUROC) curve, standard error, 95% confidence interval, accuracy, precision, recall, and F-measure. The random forest model showed the AUROC values (0.869), smallest standard error (0.025), narrowest 95% confidence interval (0.819–0.918), highest accuracy value (0.774), highest precision (0.662), and highest F-measure (0.662) for the training dataset. Thus, the random forest model is a promising technique that could be used for landslide susceptibility mapping.

Full Text