Abstract

In our competitive world, the lifestyles of human beings are very irregular. Irregularities in eating habits, daily routines, and workloads create a lot of tension that in turn leads to many chronic diseases. The four most prominent chronic diseases are heart disease, cancer, diabetes, and kidney disease. In this chapter, the potential of individual machine learning algorithms is analyzed for the prediction of severe kidney diseases. The kidney is known to be one of the most important organs of the human body. It is solely accountable for the purification of blood. The kidneys are responsible for the percolation of excess fluids and wastes from the blood so they can be excreted from the human body through urine. In the advanced stage of kidney disease, the kidney is unable to filter the surplus levels of fluid, wastes, and electrolytes from the blood, so they remain in the body. The kidney is damaged by a variety of reasons, including excess alcohol consumption, excess antibiotic doses, smoking, family history, high blood pressure, and obesity. A damaged kidney makes a person miserable. To predict the likelihood of kidney-related diseases, the doctor analyzes the key symptoms of a patient’s body and also suggests various tests such as blood tests, urine tests, and imaging tests to further confirm the disease. Based on tests and examinations, the doctor confirms whether the individual is suffering from kidney-related problems. In the present era of machine learning, the mathematical models have prediction and classification capacities. These capacities can be used to assist doctors in cross-checking their results. Therefore, they increase the prediction accuracy for disease diagnosis. The correct diagnosis of a disease can only help in providing better treatment and thus increasing the chances of survival. The dataset used in the work is taken from the University of California Irvine Knowledge Discovery in Databases (UCI KDD) site. The data have 25 attributes and 400 records divided into two classes: chronic kidney disease (CKD) and non-CKD. Preprocessing steps are applied to make the data more suitable for further analytics. The preprocessing steps here deal with the missing value, removal of noise, and normalization of attributes. This chapter then ranks the features based on different feature selection techniques such as information gain, information gain ratio, Gini Index, relief, FCBF, and chi-square. It presents a comparative study of these methods. The ranking of features is helpful in feature selection and in feature reduction. It improves the performance in terms of time and accuracy by achieving better results in the least time. In the next step, the performance of different machine learning algorithms such as k-nearest neighbor, artificial neutral network (ANN), support vector machine (SVM), AdaBoost, random forest, and naïve Bayes classifier is noted. Regression analysis is recorded and evaluated using k-fold cross-validation. Then after varying the different hyperparameters of the algorithms, the number of features and combination of features, the evaluation is done using measures such as accuracy, precision, recall, F1 measure, etc. Different standard machine learning algorithm results have been compared for the early diagnosis of CKD based on different parameters and the number of features, which will help researchers tune parameters in their respective models. Considering the information gain ratio, Gini Index, and chi-square as feature selections, all the models give accuracy greater than 96%; some models reaches 100% accuracy. This much accuracy is sufficient to assist doctors in the correct diagnosis so they can provide better treatment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call