Comparative Study of Twoing and Entropy Criterion for Decision Tree Classification of Dispersed Data

Samuel Aning,Małgorzata Przybyła-Kasperek

doi:10.1016/j.procs.2022.09.301

Abstract

In decision tree building, the choice of the splitting criteria highly affects the quality of model that is developed. In this paper, decision tree models are developed on dispersed data using entropy measure and twoing criterion as the splitting criteria. Dispersed data in this sense has multiple independent local tables on which decision tree models are built. Prediction vectors are generated based on the local models and a final prediction is made from aggregation using majority voting. In effort to improve model quality ensemble method technique (bagging) is applied to build multiple models for each local table. The main purpose of this paper is to make a comparative study on the classification quality of decision tree models built on dispersed data using entropy and twoing splitting measure. The main observation is that when knowledge is highly dispersed in a lot of local tables, using twoing criterion in building decision tree models is better than using entropy measure.

Full Text