Abstract

Classification is data mining techniques which used for the purposes of diagnosis in the medical field as measured by the high accuracy produced. The accuracy of classification algorithm is influenced by the use of features and dimensions in dataset. In this study, Chronic Kidney Disease (CKD) dataset was used where the data is one of the high dimension datasets. Support Vector Machine (SVM) algorithm is used because its ability to handle high-dimensional data. In the dataset, it consists of 24 attributes and 1 class which if all are used results accuracy of classification will be diminished. Method for selecting features with Particle Swarm Optimization (PSO) is applied to reduce redundant features and produce optimal features. In addition, ensemble AdaBoost also applied in this research to increase performance of entirety classification algorithm. The results showed that the optimization of SVM algorithm by using PSO as a selection and ensemble feature of AdaBoost with an average of selected features of 18 features could increase the accuracy of 36.20% to 99.50% in the diagnosis of CKD compared to the SVM algorithm without optimization only resulting in accuracy 63.30%. This research can be used as a reference for further research in focusing on the preprocessing stage.

Highlights

  • From various sources data can be collected and become very large

  • The results showed that the optimization of Support Vector Machine (SVM) algorithm by using Particle Swarm Optimization (PSO) as a selection and ensemble feature of AdaBoost with an average of selected features of 18 features could increase the accuracy of 36.20% to 99.50% in the diagnosis of Chronic Kidney Disease (CKD) compared to the SVM algorithm without optimization only resulting in accuracy 63.30%

  • The application of PSO and ensemble AdaBoost algorithms to optimize the SVM classification algorithm in the CKD dataset taken from the UCI Machine Learning Repository

Read more

Summary

Introduction

From various sources data can be collected and become very large. If this data is not utilized, it will only become a pile of useless data. Large and hidden databases can be extracted into useful knowledge with data mining techniques [1] [2]. Data mining can be considered as a tool to obtain knowledge from raw data, and the data that does not mean in the medical field. Data mining tasks are divided into two, namely: predictive/classification algorithms and regression algorithms that are learned through a supervised learning process [3]. From a patient’s medical record, data mining can be used to predict disease with classification [4]

Objectives
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call