Abstract

The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call