Can’t see the forest for the trees

Gero Szepannek,Björn-Hergen Von Holt

doi:10.1007/s41237-023-00205-2

Gero Szepannek, Björn-Hergen Von Holt

Open Access

https://doi.org/10.1007/s41237-023-00205-2

Copy DOI

Abstract

AbstractRandom forests are currently one of the most popular algorithms for supervised machine learning tasks. By taking into account for many trees instead of a single one the resulting forest model is no longer easy to understand and also often denoted as a black box. The paper is dedicated to the interpretability of random forest models using tree-based explanations. Two different concepts, namely most representative trees and surrogate trees are analyzed regarding both their ability to explain the model and to be understandable by humans. For this purpose explanation trees are further extended to groves, i.e. small forests of few trees. The results of an application to three real world data sets underline the inherent trade of between both requirements. Using groves allows to control for the complexity of an explanation while simultaneously analyzing their explanatory power.

Full Text