Abstract

When evaluating the process of building classification decision trees, it is necessary to assess the performance of constructed trees, as well as the speed and efficiency of the algorithm. Top-down induction algorithms are relatively simple and can quickly generate good solutions, however their deterministic nature often prevents them from finding globally optimal solutions. On the other hand, the evolutionary approach to decision tree building has yielded promising results by exploring and exploiting the entire search space. However, the standard evolutionary method of building decision trees uses the fitness-based selection of two trees for crossover, which can lead to premature convergence to a local, often sub-optimal solution. In order to maintain the diversity of the population over the course of evolution, we propose a novel method of selection that takes into consideration the similarity of trees in the crossover process, to prevent inbreeding. Several different approaches to evaluate the similarity between trees were designed and implemented. The approaches of both similar and diverse tree crossover were compared to the standard induction algorithm on twenty different data sets to determine the impact of similarity on the effectiveness and efficiency of the genetic algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call