Cautious weighted random forests

Haifei Zhang,Benjamin Quost,Marie-Hélène Masson

doi:10.1016/j.eswa.2022.118883

Abstract

Random forest is an efficient and accurate classification model, which makes decisions by aggregating a set of trees, either by voting or by averaging class posterior probability estimates. However, tree outputs may be unreliable in presence of scarce data. The imprecise Dirichlet model (IDM) provides workaround, by replacing point probability estimates with interval-valued ones. This paper investigates a new tree aggregation method based on the theory of belief functions to combine such probability intervals, resulting in a cautious random forest classifier. In particular, we propose a strategy for computing tree weights based on the minimization of a convex cost function, which takes both determinacy and accuracy into account and makes it possible to adjust the level of cautiousness of the model. The proposed model is evaluated on 25 UCI datasets and is demonstrated to be more adaptive to the noise in training data and to achieve a better compromise between informativeness and cautiousness.

Full Text