Comparison of two families of entropy-based classification measures with and without feature selection

K Giles,Qin Weng Qin Weng,K.-M Bryson

doi:10.1109/hicss.2001.926307

Abstract

Many decision tree (DT) induction algorithms, including the popular C4.5, are based on the conditional entropy (CE) measure. An interesting question involves the relative performance of other entropy measures, such as class-attribute mutual information (CAMI). We therefore conducted a theoretical analysis of CAMI that enabled us to expose relationships with CE and correct a previous CAMI result. Our computational study showed that there was only a small variation in the performance of the two measures. Since feature selection is important in DT induction, we conducted a theoretical analysis of a recently-published blurring-based feature selection algorithm and developed a new feature selection algorithm. We tested this algorithm on a wider set of test problems than in the comparable study in order to identify the benefits and limitations of blurring-based feature selection. These results provide theoretical and computational insight into entropy-based induction measures and feature selection algorithms.

Full Text