Comparative performance assessment of landslide susceptibility models with presence-only, presence-absence, and pseudo-absence data

Dong-Mei Zhao,Cheng-Jing Liu,Juan Zhang,Yin-Ping Ding,Ying-Mei Qiu,Zhi-Lin Liu,Qiu-E Xu,Jin-Liang Wang,Yuan-Mei Jiao,Chang-Run Wu

doi:10.1007/s11629-020-6277-y

Abstract

The quality of the data for statistical methods plays an important role in landslide susceptibility mapping. How different data types influence the performance of landslide susceptibility maps is worth studying. The goal of this study was to explore the effects of different data types namely, presence-only (PO), presence-absence (PA), and pseudo-absence (PAs) data, on the predictive capability of landslide susceptibility mapping. This was completed by conducting a case study in the landslide-prone Honghe County in the Yunnan Province of China. A total of 428 landslide PO data points were selected. An equivalent number of non-landslide locations were generated as PA data by random sampling, and 10,000 sites were uniformly selected at random from each region as PAs data. Three landslide susceptibility models, namely the information value model (IVM), logistic regression (LR) model, and maximum entropy (MaxEnt) model, corresponding to the three data types were investigated. Additionally, the area under the receiver operating characteristic curves (ROC-AUC), seven statistical indices (i.e. accuracy, sensibility, false-positive rate, specificity, precision, Kappa, and F-measure), and a landslide density analysis were used to evaluate model performance regarding landslide susceptibility mapping. Our results indicated that the MaxEnt model using PAs data performed the best and had the highest fitness with the highest ROC-AUC values and statistical indices, followed by the IVM model with only landslide data (PO), and the LR model using PA data. Using PAs data avoided the inherent over-predictive shortcomings of PO data by limiting the predicted area of high-landslide susceptibility. Additionally, the random sampling design of landslide PA data increased the uncertainty of landslide susceptibility mapping and influenced the performance of the model. Therefore, our results suggested that the PAs data sampling provided a useful data type in the absence of high-quality data. Finally, we summarized the principles, advantages, and disadvantages of the three data types to assist with model optimization and the improvement of predicted performance and fitness.

Full Text