Abstract

Classical decision trees such as C4.5 and CART partition the feature space using axis-parallel splits. Oblique decision trees use the oblique splits based on linear combinations of features to potentially simplify the boundary structure. Although oblique decision trees have higher generalization accuracy, most oblique split methods are not directly conducive to the categorical data and are computationally expensive. In this paper, we propose a multiway splits decision tree (MSDT) algorithm, which adopts feature weighting and clustering. This method can combine multiple numerical features, multiple categorical features, or multiple mixed features. Experimental results show that MSDT has excellent performance for multiple types of data.

Highlights

  • Despite the great success of deep neural network (DNN) model in image processing, speech recognition, and other fields in recent years, decision trees have competitive performance compared to DNN scheme, such as the advantage of interpretability, less parameters, and good robustness to noise, and can be applied to large-scale data sets with less computational cost. erefore, the decision tree is still one of the hotspots in the field of machine learning today [1,2,3]. e research mainly focused on the construction method of decision trees, split criterion [4], decision trees ensemble [5, 6], mixing with other learners [7,8,9], decision trees for semisupervised learning [10], and so on

  • Due to the time complexity, the most popular algorithms, such as ID3 [15], C4.5 [16], and CART [17], and their various modifications [18] are greedy by nature and construct the decision tree in a top-down, recursive manner

  • The binary tree can be directly used for multiclassification problems, some binary splits rely on class label, such as FDA, original SVM, etc., which makes some algorithms like Fisher’s decision tree (FDT) in [22] limited to binary classification problems

Read more

Summary

Introduction

Despite the great success of deep neural network (DNN) model in image processing, speech recognition, and other fields in recent years, decision trees have competitive performance compared to DNN scheme, such as the advantage of interpretability, less parameters, and good robustness to noise, and can be applied to large-scale data sets with less computational cost. erefore, the decision tree is still one of the hotspots in the field of machine learning today [1,2,3]. e research mainly focused on the construction method of decision trees, split criterion [4], decision trees ensemble [5, 6], mixing with other learners [7,8,9], decision trees for semisupervised learning [10], and so on. Due to the time complexity, the most popular algorithms, such as ID3 [15], C4.5 [16], and CART [17], and their various modifications [18] are greedy by nature and construct the decision tree in a top-down, recursive manner They only act on one dimension at a time and result in an axis-parallel split. It is much more difficult to search the optimal oblique hyperplanes than the optimal axis-parallel hyperplanes To solve this problem, numerous techniques have been applied, for example, hill-climbing [17], simulated annealing [19], and genetic algorithm [20]. 2. Preliminaries e proposed decision tree method needs to weight the features by RELIEF-F algorithm and split the nodes by the weighted k-means algorithm. Where dis_n(xi, μj) represents the distance on the numerical variables and dis_c(xi, μj) represents the distance on the categorical variables, respectively. c is used to adjust the proportion of dis_n(xi, μj) and dis_c(xi, μj), c ∈ [0, 1]

Our Proposed Algorithm
Figure 2
Table 2
Findings
Experiments
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.