Abstract

This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID), Exhaustive CHAID, Classification and Regression Tree (CRT), and Quick-Unbiased-Efficient Statistical Tree (QUEST). Twenty-one factors were extracted using digital elevation models (DEMs) and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0%) compared to CHAID (81.9%), CRT (75.6%), and QUEST (74.0%) model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

Highlights

  • Landslide is one of the most aggressive natural disasters that causes loss of lives and billions of dollars damages annually worldwide

  • To determine the best split at any node, any allowable pair of categories of the predictor variables is merged until there is no statistically significant difference within the pair with respect to the target variable. This Chi-square Automatic Interaction Detector (CHAID) method naturally deals with interactions between the independent variables that are directly available from an examination of the tree

  • The Exhaustive CHAID algorithm attempts to overcome this problem by continuing to merge categories, irrespective of significance level, until only two categories remain for each predictor

Read more

Summary

Introduction

Landslide is one of the most aggressive natural disasters that causes loss of lives and billions of dollars damages annually worldwide. With the development of GIS data processing techniques, quantitative studies have been applied to landslide susceptibility analysis using various techniques. Such studies can be identified on the basis of the techniques used, such as probabilistic methods [14,15,16,17,18], logistic regression [19,20,21], and artificial neural network [22,23,24,25]. Decision tree showed a good ability in determinations of the important factors causing the landslide compared with other used models. The experiment contained ten rounds according to different partitions of training sets and test sets

Decision Trees
Study Area
Data Collection
Discussion
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call