Risk prediction of diabetic nephropathy using machine learning techniques: A pilot study with secondary data

Md Maniruzzaman,Md Merajul Islam,Md Jahanur Rahman,Md Al Mehedi Hasan,Jungpil Shin

doi:10.1016/j.dsx.2021.102263

Abstract

AimsThis research work presented a comparative study of machine learning (ML), including two objectives: (i) determination of the risk factors of diabetic nephropathy (DN) based on principal component analysis (PCA) via different cutoffs; (ii) prediction of DN patients using ML-based techniques. MethodsThe combination of PCA and ML-based techniques has been implemented to select the best features at different PCA cutoff values and choose the optimal PCA cutoff in which ML-based techniques give the highest accuracy. These optimum features are fed into six ML-based techniques: linear discriminant analysis, support vector machine (SVM), logistic regression, K-nearest neighborhood, naïve Bayes, and artificial neural network. The leave-one-out cross-validation protocol is executed and compared ML-based techniques performance using accuracy and area under the curve (AUC). ResultsThe data utilized in this work consists of 133 respondents having 73 DN patients with an average age of 69.6±10.2 years and 54.2% of DN patients are female. Our findings illustrate that PCA combined with SVM-RBF classifier yields 88.7% accuracy and 0.91 AUC at 0.96 PCA cutoff. ConclusionsThis study also suggests that PCA combined with SVM-RBF classifier may correctly classify DN patients with the highest accuracy when compared to the models published in the existing research. Prospective studies are warranted to further validate the applicability of our model in clinical settings.

Full Text