Abstract

One of the main goals in machine learning studies is to determine the most significant variables on a specific research problem. Various algorithms have been developed to achieve this goal. Random forest, Cubist, and MARS algorithms are the most common ones among these algorithms. Although classical statistical algorithms have been useful to obtain the importance level of the effective variables on the output in a certain amount, the machine learning algorithms may provide clearer and more precise results. In this study, the estimation results of Random Forest, Cubist, and MARS algorithms have been presented comparatively in terms of some performance criteria like mean squares error, the coefficient of determination, and mean absolute error by using a real data set. The results show that the performances of Random Forest and Cubist are similar amongst themselves but better than MARS. Additionally, the rank of the most important variables varies according to the type of algorithm. The concordance between algorithms is investigated from a statistical perspective and found satisfactory. Consequently, Random Forest, Cubist, and MARS can be considered effective and reasonable algorithms for both estimation performance and variable importance evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call