Abstract

The use of prediction error to optimize the number of splitting rules in a tree model does not control the probability of the emergence of splitting rules with a predictor that has no functional relationship with the target variable. To solve this problem, a new optimization method is proposed. Using this method, the probability that the predictors used in splitting rules in the optimized tree model have no functional relationships with the target variable is confined to less than 0.05. It is fairly convincing that the tree model given by the new method represents knowledge contained in the data.

Highlights

  • In constructing a tree model (e.g., Breiman et al [1], Takezawa [2], Chapter 9 of Chambers and Hastie [3]), the optimization of the number of decision rules is crucial; it is equivalent to the selection of predictors in multiple regression and to the optimization of smoothing parameters in nonparametric regression. 10-fold crossvalidation is a well-known technique among diverse methods of optimizing the number of decision rules. 10-fold cross-validation partitions the sample into 10 groups at random

  • This method is based on an intuitive idea that, since the number of decision rules yielded by 10-fold cross-validation is too large, the reduction in the number should be permitted in the range of the standard error of the number of decision rules; the large number of real examples of tree models demonstrate the efficacy of this method

  • The above simulations indicate that when is optimized using prediction error given by 10-fold crossvalidation to produce a tree model and some of the predictors are not related to the target variable, the probability of constructing decision rules with a predictor not associated with the target variable is not constant

Read more

Summary

Introduction

In constructing a tree model (e.g., Breiman et al [1], Takezawa [2], Chapter 9 of Chambers and Hastie [3]), the optimization of the number of decision rules is crucial; it is equivalent to the selection of predictors in multiple regression and to the optimization of smoothing parameters in nonparametric regression. 10-fold crossvalidation is a well-known technique among diverse methods of optimizing the number of decision rules. 10-fold cross-validation partitions the sample into 10 groups at random. Experience with real data shows that the number of decision rules optimized by 10-fold crossvalidation tends to be slightly larger than the optimal one in terms of prediction error. We present a method of optimizing a tree model using the possibility that the model contains apparently inappropriate decision rules; the method does not utilize prediction error. It is an application of the method of selecting predictors that almost certainly have. We show using real data that the relationship between the prediction error of a tree model and the appropriateness of decision rules is not simple. A numerical simulation of this method and the application of this method to the analysis of the real data are shown

Definition of the Problem Using Real Data
A New Method of Optimizing a Tree Model
Numerical Simulation
Real Data Example
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call