Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Haohui Lu,Shahadat Uddin

doi:10.1371/journal.pone.0301541

Abstract

Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Apr 18, 2024
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Dataset meta-level and statistical features affect machine learning performance
Shahadat Uddin ... Haohui Lu
Scientific Reports | VOL. 14
Shahadat Uddin, et. al.Shahadat Uddin ... Haohui Lu
19 Jan 2024
Scientific Reports | VOL. 14

Predicting stroke, neurological and movement disorders using single and dual-task gait in Korean older population
Marco Recenti ... Seung Uk Ko
Gait & Posture | VOL. 105
Marco Recenti, et. al.Marco Recenti ... Seung Uk Ko
24 Jul 2023
Gait & Posture | VOL. 105

Pixel-wise classification in graphene-detection with tree-based machine learning algorithms
Woon Hyung Cho ... Jiseon Shin
Machine Learning: Science and Technology | VOL. 3
Woon Hyung Cho, et. al.Woon Hyung Cho ... Jiseon Shin
01 Dec 2022
Machine Learning: Science and Technology | VOL. 3

An explainable model for the mass appraisal of residences: The application of tree-based Machine Learning algorithms and interpretation of value determinants
Muzaffer Can Iban
Habitat International | VOL. 128
Muzaffer Can IbanMuzaffer Can Iban
31 Aug 2022
Habitat International | VOL. 128

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE