Abstract
The effectiveness of data-driven landslide susceptibility mapping relies on data integrity and advanced geospatial analysis; however, selecting the most suitable method and identifying key regional factors remains a challenging task. To address this, this study assessed the performance of six machine learning models, including Convolutional Neural Networks (CNNs), Random Forest (RF), Categorical Boosting (CatBoost), their CNN-based hybrid models (CNN+RF and CNN+CatBoost), and a Stacking Ensemble (SE) combining CNN, RF, and CatBoost in mapping landslide susceptibility along the Karakoram Highway in northern Pakistan. Twelve geospatial factors were examined, categorized into Topography/Geomorphology, Land Cover/Vegetation, Geology, Hydrology, and Anthropogenic Influence. A detailed landslide inventory of 272 occurrences was compiled to train the models. The proposed stacking ensemble and hybrid models improve landslide susceptibility modeling, with the stacking ensemble achieving an AUC of 0.91. Hybrid modeling enhances accuracy, with CNN–RF boosting RF’s AUC from 0.85 to 0.89 and CNN–CatBoost increasing CatBoost’s AUC from 0.87 to 0.90. Chi-square (χ2) values (9.8–21.2) and p-values (<0.005) confirm statistical significance across models. This study identifies approximately 20.70% of the area as from high to very high risk, with the SE model excelling in detecting high-risk zones. Key factors influencing landslide susceptibility showed slight variations across the models, while multicollinearity among variables remained minimal. The proposed modeling approach reduces uncertainties, enhances prediction accuracy, and supports decision-makers in implementing effective landslide mitigation strategies.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have