&lt;title&gt;Decision trees for symbolic knowledge based on contingency table analysis&lt;/title&gt;

Thomas W Rauber,A S Steiger-Garcao

doi:10.1117/12.150599

Abstract

In this paper we point out an alternative basis for splitting a node of a decision tree. We use exactly the same framework of the tree generation as ID3 does, in order to be able to compare the results properly. The splitting of the sample set is also done locally at a tree node, without considering earlier decisions about the partition of the samples. Only one attribute is used to split the samples. We point out different splitting criteria. Contingency tables are a technique in nonparametric statistics to analyze categorical (symbolic) populations. Among other useful applications of contingency tables, dependence tests between rows and columns of the table can be performed. A sample set is inserted into a contingency table with classes as columns and all values of an attribute as rows. A variety of measurements of dependence can then be derived. Results in respect to the two most important qualities of decision trees, the error rate and tree complexity, are presented. For a set of selected benchmark examples the performance of ID3 and the contingency table approach are compared. It is shown that in many cases the contingency table method exhibits lower estimated error rates or has less nodes for the generated decision tree.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Full Text