Abstract

Most features of health data that have many irrelevant features can reduce the performance of classification method. One health data that has many attributes is the Pima Indian Diabetes dataset and Thyroid. Diabetes is a deadly disease caused by the increasing of blood sugar because of the body's inability to produce enough insulin and its complications can lead to heart attacks and strokes. The purpose of this research is to do a combination of Correlated Naïve Bayes method and Wrapper-based feature selection to classification of health data. The stages of this research consist of several stages, namely; (1) the collection of Pima Indian Diabetes and Thyroid dataset from UCI Machine Learning Repository, (2) pre-processing data such as transformation, Scaling, and Wrapper-based feature selection, (3) classification using the Correlated Naive Bayes and Naive Bayes methods, and (4) performance test based on its accuracy using the 10-fold cross validation method. Based on the results, the combination of Correlated Naive Bayes method and Wrapper-based feature selection get the best accuracy for both datasets used. For Pima Indian Diabetes dataset, the accuracy is 71,4% and the Thyroid dataset accuracy is 79,38%. Thus, the combination of Correlated Naïve Bayes method and Wrapper-based feature selection result in better accuracy without feature selection with an increase of 4,1% for Pima Indian Diabetes dataset and 0,48% for the Thyroid dataset.

Highlights

  • Abstrak—Kebanyakan fitur pada data kesehatan terdapat fitur tidak relevan sehingga dapat menurunkan kinerja metode klasifikasi

  • a deadl y disease c aused by the inc reasing

  • Thyroid dataset from UCI Mac hine Learning Repository

Read more

Summary

PENDAHULUAN

Untuk men ingkatkan ketepatan klasifikasi pada data kesehatan membutuhkan metode klasifikasi dengan kinerja yang baik. Pemilihan fitur d igunakan untuk memilih fitur -fitur yang berpengaruh, menghapus fitur tidak relevan pada atribut dataset, waktu ko mputasi menjadi cepat, dan dapat men ingkatkan kinerja dari metode klasifikasi [4], [5]. Metode klasifikasi yang digunakan penelitian ini adalah algorit ma Correlated Naive Bayes. Berdasarkan uraian di atas, terdapat gap penelitian ini dengan penelitian sebelumnya yaitu belu m ada penelitian yang mengkomb inasikan metode Correlated Naive Bayes dan seleksi fitur berbasis Wrapper untuk klasifikasi data kesehatan yang memiliki banyak fitur. Sebagai perbandingan penelitian [13] menggunakan algorit ma Correlated Naive Bayes untuk klasifikasi penyakit diabetes tanpa menggunakan seleksi fitur. Penelit ian ini mengko mbinasikan algorit ma Correlated Naive Bayes dan seleksi fitur berbasis Wrapper untuk klasifikasi data kesehatan untuk mendapatkan akurasi optimal

MET ODE
Normal
Pengumpulan Data
Pra-pengolahan Data
Pengujian Kinerja
PENUT UP
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call