On Handling Tree-Structured Attributes in Decision Tree Learning

Hussein Almuallim

doi:10.1016/b978-1-55860-377-6.50011-6

Abstract

This paper studies the problem of learning decision trees when the attributes of the domain are tree-structured. We first describe two pre-processing approaches, the Quinlan- encoding and the bit-per-category methods, that re-encode the training examples in terms of new nominal attributes. We then introduce our own approach which handles tree- structured attributes directly without the need for pre-processing. We show that our direct approach is more efficient than the bit-per-category approach. The two methods follow the same generalization behavior, so our direct approach should always be preferred. The Quinlan-encoding approach and our direct approach have similar computational complexity (although we experimentally show that the direct approach runs roughly two to four times faster). We present experiments on natural and artificial data that suggest that our direct approach leads to better generalization performance than the Quinlan-encoding approach.

Full Text