Abstract

Many decision tree (DT) induction algorithms, including the popular C4.5, are based on the conditional entropy (CE) measure. An interesting question involves the relative performance of other entropy measures, such as class-attribute mutual information (CAMI). We therefore conducted a theoretical analysis of CAMI that enabled us to expose relationships with CE and correct a previous CAMI result. Our computational study showed that there was only a small variation in the performance of the two measures. Since feature selection is important in DT induction, we conducted a theoretical analysis of a recently-published blurring-based feature selection algorithm and developed a new feature selection algorithm. We tested this algorithm on a wider set of test problems than in the comparable study in order to identify the benefits and limitations of blurring-based feature selection. These results provide theoretical and computational insight into entropy-based induction measures and feature selection algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call