Post-pruning in regression tree induction: An integrated approach

K Oseibryson

doi:10.1016/j.eswa.2007.01.017

Abstract

The regression tree (RT) induction process has two major phases: the growth phase and the pruning phase. The pruning phase aims to generalize the RT that was generated in the growth phase by generating a subtree that avoids over-fitting to the training data. Most post-pruning methods essentially address post-pruning as if it were a single objective problem (i.e., maximize validation accuracy), and address the issue of simplicity (in terms of the number of leaves) only in the case of a tie. However, it is well known that apart from accuracy there are other performance measures (e.g., stability, simplicity) that are important for evaluating DT quality. In this paper we present an integrated approach to post-pruning phase that simultaneously accommodates multiple performance measures that are important for evaluating RT quality, and obtains the optimal subtree based on user provided preference and value function information.

Full Text