Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking

Yoga Pristyanto,Atik Nurmasani,Acihmah Sidauruk

doi:10.30865/mib.v6i1.3442

Yoga Pristyanto, Atik Nurmasani + Show 1 more

Open Access

https://doi.org/10.30865/mib.v6i1.3442

Copy DOI

Abstract

Diabetes is a disease that has the potential to cause death. Based on a report from the IDF (International Diabetes Federation), it was stated that in 2019 there were 463 million people in the world suffering from this disease. According to the Ministry of Health, Indonesia is a country that is included in the top 10 highest in the world by the number of people with diabetes. Machine learning models can be a solution for the early detection of diabetes based on history and available data. The majority of the research that has been done chiefly uses a single classifier. The single classifier model has a weakness when faced with class imbalance conditions in the dataset. Therefore, this study uses the Stacking Model for the classification and prediction process on the diabetes dataset. The goal is to improve the performance of a single classifier. In addition, the Stacking Model is expected to be one of the solutions for the classification of diabetes in the imbalanced class dataset. Based on two test experiments that have been carried out using two different datasets. The Stacking algorithm can produce an accuracy value of 89%, TPR value of 89%, TNR value of 85%, and G-Mean of 86.98% in the first dataset and can produce an accuracy value of 96%, TPR value of 96%, TNR value of 94%, and G-Mean of 94.99% in the second dataset. These results are better than single classifiers such as C4.5, K-NN, and SVM of the four indicators evaluated in both diabetes datasets. Thus, the proposed algorithm, namely Stacking (C4.5-SVM), can be a solution for classifying diabetes datasets with unbalanced class distribution conditions.

Full Text