Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population

Ratchainant Thammasudjarit,Sigit Ari Saputro,Punnathorn Ingsathit,Ammarin Thakkinstian,Atiporn Ingsathit

doi:10.33165/rmj.2021.44.4.250334

Ratchainant Thammasudjarit, Sigit Ari Saputro + Show 3 more

Open Access

https://doi.org/10.33165/rmj.2021.44.4.250334

Copy DOI

Abstract

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.

Full Text