On Cesáro Averages for Weighted Trees in the Random Forest

Hieu Pham,Sigurður Olafsson

doi:10.1007/s00357-019-09322-8

Abstract

The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesaro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.

Full Text