Landslides are one of the major geohazards threatening human society. The objective of this study was to conduct a landslide hazard susceptibility assessment for Ruijin, Jiangxi, China, and to provide technical support to the local government for implementing disaster reduction and prevention measures. Machine learning approaches, e.g., random forests (RFs) and support vector machines (SVMs) were employed and multiple geo-environmental factors such as land cover, NDVI, landform, rainfall, lithology, and proximity to faults, roads, and rivers, etc., were utilized to achieve our purposes. For categorical factors, three processing approaches were proposed: simple numerical labeling (SNL), weight assignment (WA)-based and frequency ratio (FR)-based. Then 19 geo-environmental factors were respectively converted into raster to constitute three 19-band datasets, i.e., DS1, DS2, and DS3 from three different processes. Then, 155 observed landslides that occurred in the past decades were vectorized, among which 70% were randomly selected to compose a training set (TS1) and the remaining 30% to form a validation set (VS1). A number of non-landslide (no-risk) samples distributed in the whole study area were identified in low slope (<1–3°) zones such as urban areas and croplands, and also added to the TS1 and VS1 in the same ratio. For comparison, we used the FR approach to identify the no-risk samples in both flat and non-flat areas, and merged them into the field-observed landslides to constitute another pair of training and validation sets (TS2 and VS2) using the same ratio of 7:3. The RF algorithm was applied to model the probability of the landslide occurrence using DS1, DS2, and DS3 as predictive variables and TS1 and TS2 for training to obtain the SNL-based, WA-based, and FR-based RF models, respectively. Verified against VS1 and VS2, the three models have similar overall accuracy (OA) and Kappa coefficient (KC), which are 89.61%, 91.47%, and 94.54%, and 0.7926, 0.8299, and 0.8908, respectively. All of them are much better than the three models obtained by SVM algorithm with OA of 81.79%, 82.86%, and 83%, and KC of 0.6337, 0.655, and 0.660. New case verification with the recent 26 landslide events of 2017–2020 revealed that the landslide susceptibility map from WA-based RF modeling was able to properly identify the high and very high susceptibility zones where 23 new landslides had occurred, and performed better than the SNL-based and FR-based RF modeling, though the latter has a slightly higher OA and KC. Hence, we concluded that all three RF models achieve reasonable risk prediction, but WA-based and FR-based RF modeling deserves a recommendation for application elsewhere. The results of this study may serve as reference for the local authorities in prevention and early warning of landslide hazards.
Read full abstract