Abstract

Chronic kidney disease (CKD) is one of the serious health concerns in the twenty-first century. CKD impacts over 37 million Americans. By applying machine learning (ML) techniques to clinical data, CKD can be diagnosed early. This early detection of CKD can prevent numerous loss of life. In this work, clinical data set of 400 patients, available on the UCI repository, are taken. Unfortunately, this data set doesn’t have an equal distribution of CKD and Non-CKD samples. This imbalanced nature of data highly influences the learning capabilities of classifiers. Genetic Programming (GP) is an ML technique based on the evolution of species. GP with standard fitness function, also impacted by this imbalanced nature of data. A new Euclidean distance-based fitness function in GP is proposed to handle this imbalanced nature of the data set. To compare the robustness of the proposed work, other classification techniques, K-nearest neighborhood (KNN), KNN with particle swarm optimization (PSO), and GP with the standard fitness function, is also applied. For ten-fold cross-validation, the KNN shows an accuracy of 83.54% with an AUC value of 0.69, the PSO-KNN shows an accuracy of 96.79% with an AUC value of 0.94, and the GP, with the newly proposed fitness function, supersedes KNN and PSO-KNN and shows the accuracy of 99.33% with an AUC value of 0.99.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call