Diabetes mellitus is a glucose disorder disease in the human body that contributes significantly to the high mortality rate. Various studies on early detection and classification have been conducted as a diabetes mellitus prevention effort by applying a machine learning model. The problems that may occur are weak model performance and misclassification caused by imbalanced data. The existence of dominating (majority) data causes poor model performance in identifying minority data. This paper proposed handling the problem of imbalanced data by performing the synthetic minority oversampling technique (SMOTE) and observing its effect on the classification performance of the support vector machine (SVM) and Backpropagation artificial neural network (ANN) methods. The experiment showed that the SVM method and imbalanced data achieved 94.31% accuracy, and the Backpropagation ANN achieved 91.56% accuracy. At the same time, the SVM method and balanced data produced an accuracy of 98.85%, while the Backpropagation ANN method and balanced data produced an accuracy of 94.90%. The results show that oversampling techniques can improve the performance of the classification model for each data class.
Read full abstract