Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data

Findri Wara Putri Findri Wara Putri,Fadhilah Fitri Fadhilah Fitri,Atus Amadi Putra Atus Amadi Putra,Dodi Vionanda Dodi Vionanda

doi:10.24036/ujsds/vol1-iss5/116

Findri Wara Putri Findri Wara Putri, Fadhilah Fitri Fadhilah Fitri + Show 2 more

Open Access

https://doi.org/10.24036/ujsds/vol1-iss5/116

Copy DOI

Journal: UNP Journal of Statistics and Data Science	Publication Date: Nov 30, 2023
License type: CC BY 4.0

Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.

Full Text