Abstract

Learning decision trees is a difficult optimization problem: nonconvex, nondifferentiable and over a huge number of tree structures. The dominant paradigm in practice, established in the 1980s, are axis-aligned trees trained with a greedy recursive partitioning algorithm such as CART or C5.0. Several non-greedy optimization algorithms have been proposed recently, which optimize all the nodes' parameters jointly, and we compare experimentally some of them in a range of classification and regression datasets, in terms of accuracy, training time and tree size. The non-greedy algorithms do not improve over CART significantly with one exception, tree alternating optimization (TAO). TAO scales to large datasets and produces axis-aligned and especially oblique trees that consistently outperform all other algorithms, often by a large margin. TAO makes oblique trees preferable to axis-aligned ones in many cases, since they are much more accurate while remaining small and interpretable. This suggests a change in paradigm in practical applications of decision trees.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call