Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call