Random forests with parametric entropy-based information gains for classification and regression problems.

Vera Ignatenko,Sergei Koltcov,Anton Surkov

doi:10.7717/peerj-cs.1775

Vera Ignatenko, Sergei Koltcov + Show 1 more

Open Access

https://doi.org/10.7717/peerj-cs.1775

Copy DOI

Abstract

The random forest algorithm is one of the most popular and commonly used algorithms for classification and regression tasks. It combines the output of multiple decision trees to form a single result. Random forest algorithms demonstrate the highest accuracy on tabular data compared to other algorithms in various applications. However, random forests and, more precisely, decision trees, are usually built with the application of classic Shannon entropy. In this article, we consider the potential of deformed entropies, which are successfully used in the field of complex systems, to increase the prediction accuracy of random forest algorithms. We develop and introduce the information gains based on Renyi, Tsallis, and Sharma-Mittal entropies for classification and regression random forests. We test the proposed algorithm modifications on six benchmark datasets: three for classification and three for regression problems. For classification problems, the application of Renyi entropy allows us to improve the random forest prediction accuracy by 19-96% in dependence on the dataset, Tsallis entropy improves the accuracy by 20-98%, and Sharma-Mittal entropy improves accuracy by 22-111% compared to the classical algorithm. For regression problems, the application of deformed entropies improves the prediction by 2-23% in terms of R2 in dependence on the dataset.

Full Text