Abstract

Technological development resulted in data proliferating. The data is processed into valid information for daily needs. Data mining is a technique to convert data into useful information. Data mining has been widely used in performing prediction functions, for example, health and medical science. This study using Wisconsin Diagnostic Breast Cancer dataset taken from UCI Machine Learning Repository. The dataset has 32 attributes with 569 samples. This data has a continuous and high dimensional data type, and it makes the C4.5 algorithm need long computation time and extensive storage. This study aims to improve the accuracy of the C4.5 with a combination of K-Means and Genetic Algorithm. These study results compared the accuracy of the C4.5 algorithm before and after applying the combination of K-Means and the Genetic Algorithm for diagnosing breast cancer. The accuracy of C4.5 is 91,228%. Meanwhile, the accuracy of C4.5 after optimized using the K-Means and Genetic Algorithm is 94,824%, with the average number of features are selected 22 features. Thus, the application of K-Means and Genetic Algorithm on the C4.5 Algorithm can improve the accuracy of diagnosing breast cancer by 3,596%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call