The lack of reliable negative samples is an important factor limiting the quality of machine learning-based debris flow susceptibility mapping (DFSM). The purpose of this paper is to propose multiple negative-sample acquisition strategies for DFSM considering different sample representation forms. The sample representation forms mainly include a single grid, multi-grid, and watershed unit, and the negative-sample acquisition strategies are based on support vector machine (SVM), spy technique, and isolation forest (IF) methods, respectively. These three strategies can assign a value to all the samples based on different assumptions, and reliable, negative samples can be generated from samples with values below a predefined threshold. Combining different sample representation forms with negative sample acquisition strategies, nine datasets were then involved in random forest (RF) modeling. The receiver operating characteristic (ROC) curves and related statistical results were used to evaluate the models. The results show that the strategy based on the spy technique is suitable for multiple datasets, while the IF-based strategy is well-adapted to the watershed unit datasets. This study can provide more options for improving the quality of datasets in DFSM, which can further improve the performance of machine learning models.
Read full abstract