Constructing decision trees with multiple response variables

Seong-Jun Kim,Kang Bae Lee

doi:10.1504/ijmdm.2003.003998

Abstract

Data mining is a process of discovering meaningful patterns in large data sets that are useful for decision making and has recently received an amount of attention in a wide range of business and engineering fields. Decision tree, also known as recursive partitioning or rule induction, is one of the most frequently used methods for data mining. A decision tree, on a divide-and-conquer basis, provides a set of rules for classifying samples in the learning data set. Most of works on decision tree have been conducted for the case of single response variable. However, situations where multiple response variables should be considered arise from many applications, for example, manufacturing process monitoring, customer management, and clinical and health analysis. This article concerns constructing decision trees when there are two or more response variables in the data set. In this article, we investigate node homogeneity criteria such as entropy and Gini index and then present three approaches to constructing decision trees with multiple response variables. To do so, we first describe extensions of entropy and a Gini index to the case in which multiple response variables are of concern. A weighting method for node splitting is also explained. Next, we present a decision tree minimising an expected loss due to misclassifications. To illustrate the procedures, numerical examples are given with discussions.

Full Text