Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data

Selcuk Demir,Emrehan Kutlug Sahin

doi:10.1016/j.soildyn.2021.107130

Abstract

This research investigates and compares the performance of three tree-based Machine Learning (ML) methods, Canonical Correlation Forest (CCF), Rotation Forest (RotFor), and Random Forest (RF), for predicting the liquefaction potential of soils based on the cone penetration test (CPT) case history datasets collected from previously published research. The ML models are trained and validated using the Stratified Random Sampling technique for training and test datasets considering three sampling ratios as 50:50, 40:60, and 70:30. In addition, a comparative example was applied to show the difference between the Stratified Random Sampling and the Simple Random Sampling technique, which is the most common probability-based sampling method, considering only a dataset. The predictive capabilities of the developed models are evaluated using Overall Accuracy, Kappa, Precision, Recall, and F-Measure values. Lastly, the Wilcoxon Signed-Rank Test and the Pearson's Correlation Coefficient are adopted to determine the statistical significance of the accuracies between the tree-based ML methods. Generally, tree-based ML methods of CCF, RotFor and, RF are found robust with respect to the variations in training sample sizes, and the performance metrics revealed that the CCF and RotFor method exhibited slightly better performance than the conventional RF method. Finally, based on the results obtained from performance assessment output, CCF and RotFor methods which are the first application in the soil liquefaction issue to the best of our knowledge are worth considering in the prediction of soil liquefaction.

Full Text