End-to-End Learning of Decision Trees and Forests

Julian F. P. Kooij,Thomas M. Hehn,Fred A. Hamprecht

doi:10.1007/s11263-019-01237-6

Julian F. P. Kooij, Thomas M. Hehn

Open Access

https://doi.org/10.1007/s11263-019-01237-6

Copy DOI

Abstract

Conventional decision trees have a number of favorable properties, including a small computational footprint, interpretability, and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable. Kontschieder et al. (ICCV, 2015) have addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here present an end-to-end learning scheme for deterministic decision trees and decision forests. Thanks to a new model and expectation–maximization training scheme, the trees are fully probabilistic at train time, but after an annealing process become deterministic at test time. In experiments we explore the effect of annealing visually and quantitatively, and find that our method performs on par or superior to standard learning algorithms for oblique decision trees and forests. We further demonstrate on image datasets that our approach can learn more complex split functions than common oblique ones, and facilitates interpretability through spatial regularization.

Highlights

Neural networks are currently the dominant classifier in computer vision (Russakovsky et al 2015; Cordts et al 2016), whereas decision trees and decision forests have proven their worth when training data or computational resources are scarce (Barros et al 2012; Criminisi and Shotton 2013)
For quantitative comparison of our end-to-end learned oblique decision trees (E2EDT ), we evaluate the performance on the multivariate but unstructured datasets used in Norouzi et al (2015b) (Sect. 3.1)
We find that deterministic E2EDF achieves higher average accuracy than random forest (RF) and boosted trees (BT) on all datasets, and outperforms all other methods on MNIST

Summary

Introduction

Neural networks are currently the dominant classifier in computer vision (Russakovsky et al 2015; Cordts et al 2016), whereas decision trees and decision forests have proven their worth when training data or computational resources are scarce (Barros et al 2012; Criminisi and Shotton 2013). One can observe that both neural networks and decision trees are composed of basic computational units, the perceptrons and nodes, respectively. A crucial difference between the two is that in a standard neural network, all units are evaluated for Communicated by Mario Fritz.

Objectives

Methods

Results

Conclusion