Estimation of distribution algorithms for decision-tree induction

Henry E L Cagnini,Rodrigo C Barros,Marcio P Basgalupp

doi:10.1109/cec.2017.7969549

Abstract

Decision trees are one of the most widely employed classification models, mainly due to their capability of being properly interpreted and understood by the domain specialist. However, decision-tree induction algorithms have limitations due to the typical recursive top-down greedy search they implement. Such local search may often lead to quality loss while the partitioning process occurs, generating statistically insignificant rules. In order to avoid the typical greedy strategy and to prevent convergence to local optima, we present a novel Estimation of Distribution Algorithm (EDA) for decision-tree induction, namely Ardennes. For evaluating the proposed approach, we present results of an empirical analysis in 10 real-world classification datasets. We compare Ardennes with both a well-known traditional greedy algorithm for decision-tree induction and also with a more recent global population-based approach. Results show the feasibility of using EDAs as a means to avoid the previously-described problems. We report gains when using Ardennes in terms of accuracy and - equally important - tree comprehensibility.

Full Text