Decision tree is a widely used non-parametric technique in machine learning, data mining and pattern recognition. It is simple to understand and interpret, however it faces challenges such as handling higher dimensional and class imbalanced datasets, over-fitting and instability. To overcome some of these issues, vertical partitioning approaches like serial partitioning, theme based partitioning are used in the literature. A vertical partitioning approach divides the feature set into subsets of features (blocks) and makes use of these subsets for subsequent tasks. In this work, we use the ideas of music rhythm tree to propose a novel vertical partitioning technique. It orders the features based on the average correlation strength of the features before partitioning the feature set. The proposed method is proved to be superior by showing an average of 13.8%,6%,9.8%,19.7%,9.4%, and 29.4% higher classification accuracy over C4.5, Random Forest, Bagging, Adaboost, an ensemble technique and a vertical partitioning technique respectively. Our empirical results on 15 datasets demonstrate that the proposed vertical partitioning method is more stable and better in handling class-imbalanced data. Finally, some popular statistical tests are conducted to validate the statistical significance of the results of the proposed method.
Read full abstract