Improved Accuracy In Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost)

Muhammad Zarlis,Muhammad Riansyah,Saib Suwilo

doi:10.33395/sinkron.v8i2.12055

Abstract

The Decision Tree algorithm is a data mining method algorithm that is often applied as a solution to a problem for a classification. The Decision Tree C5.0 algorithm has several weaknesses, including: the C5.0 algorithm and several other decision tree methods are often biased towards modeling whose features have many levels, some problems for the model can occur such as over-fit or under-fit challenges, big changes to decision logic can result in small changes to data training, C5.0 can experience modeling inconvenience, data imbalance causes low accuracy in C5.0 algorithm. The boosting algorithm is an iterative algorithm that gives different weights to the distribution of training data in each iteration. Each iteration of boosting adds weight to examples of misclassification and decreases weight to examples of correct classification, thereby effectively changing the distribution of the training data. One example of a boosting algorithm is adaboost. The purpose of this research is to improve the performance of the Decision Tree C5.0 classification method using adaptive boosting (adaboost) to predict hepatitis disease using the Confusion matrix. Tests that have been carried out with the Confusion Matrix use the Hepatitis dataset in the Decision Tree C5.0 classification which has an accuracy rate of 80.58% with a classification error rate of 19.15%. Whereas in the Decision Tree C5.0 classification Adaboost has a higher accuracy rate of 82.98%, a classification error rate of 17.02%. This difference is caused by the adaboost algorithm, because the adaboost algorithm is able to change a weak classifier into a strong classifier by increasing the weight of the observations, and adaboost is also able to reduce the classifier error rate.

Full Text