Comparison of discretization methods for classifier decision trees and decision rules on medical data sets

Yılmaz Kaya,Ramazan Teki̇n

doi:10.31590/ejosat.1080098

Abstract

Data sets in real life are given by real numbers in databases. On the other hand, many data mining methods like association rules and induction rules require only discrete attributes. For this reason, it is necessary to convert data sets with continuous attributes into data sets with discrete attributes. Discretization process is reducing the number of values for a given continuous attribute by dividing the range of the attribute into intervals. In this paper, eight discretization methods are presented with JRip, OneR, J48, and Part classifier algorithms of rules and tress. Experiments include a ten-fold cross validation provided on the basis of real-life data sets from the UCI repository. We show that discretization is important step to significantly increase the classification results of these algorithms. Finally, as a result of the study, it was seen that MDL and J48, CAIM and Jrip and Extended Chi and J48 methods gave the highest accuracy for PIMA, WBC and DERMA data sets, respectively.

Full Text