Comparative Analysis of Machine Learning Algorithms Based on Variable Importance Evaluation

Hasan Yildirim

doi:10.53525/jster.988672

Abstract

One of the main goals in machine learning studies is to determine the most significant variables on a specific research problem. Various algorithms have been developed to achieve this goal. Random forest, Cubist, and MARS algorithms are the most common ones among these algorithms. Although classical statistical algorithms have been useful to obtain the importance level of the effective variables on the output in a certain amount, the machine learning algorithms may provide clearer and more precise results. In this study, the estimation results of Random Forest, Cubist, and MARS algorithms have been presented comparatively in terms of some performance criteria like mean squares error, the coefficient of determination, and mean absolute error by using a real data set. The results show that the performances of Random Forest and Cubist are similar amongst themselves but better than MARS. Additionally, the rank of the most important variables varies according to the type of algorithm. The concordance between algorithms is investigated from a statistical perspective and found satisfactory. Consequently, Random Forest, Cubist, and MARS can be considered effective and reasonable algorithms for both estimation performance and variable importance evaluation.

Full Text