Abstract

C45 is a highly effective decision tree algorithm widely used for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates decision trees that are easier to understand and does so in a faster manner. This is due to C4.5 selecting attributes based on their information content during each stage of the process. After generating the decision tree model, its performance needs to be evaluated. One commonly used method is the prediction error rate, which assesses the model's performance. The prediction error rate consists of two approaches: the train error rate, which employs the same data for both building and testing the model, potentially leading to overfitting, and the test error rate, which divides the data into training and testing sets. The test error rate includes cross validation techniques such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-folds cross validation. Considering these factors, this research focuses on comparing the three cross-validation methods for predicting error rates applied to the C4.5 algorithm. The study utilizes artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and correlations. Different correlation structures are applied between two relevant variables and between relevant and irrelevant variables in the bivariate and multivariate data, including three correlation levels: no correlation, moderate correlation, and high correlation. This research findings that k-folds cross validation is the most suitable cross validation method to apply to C4.5.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.