Optimization of parallel random forest algorithm based on distance weight

Qinge Wang,Huihua Chen,Srikanta Patnaik

doi:10.3233/jifs-179965

Abstract

In order to overcome the problems of long execution time and low parallelism of existing parallel random forest algorithms, an optimization method for parallel random forest algorithm based on distance weights is proposed. The concept of distance weights is introduced to optimize the algorithm. Firstly, the training sample data are extracted from the original data set by random selection. Based on the extracted results, a single decision tree is constructed. The single decision tree is grouped together according to different grouping methods to form a random forest. The distance weights of the training sample data set are calculated, and then the weighted optimization of the random forest model is realized. The experimental results show that the execution time of the parallel random forest algorithm after optimization is 110 000 ms less than that before optimization, and the operation efficiency of the algorithm is greatly improved, which effectively solves the problems existing in the traditional random forest algorithm.

Full Text