Abstract

Introduction: There is a large amount of specific health examination data in Japan. Efforts to utilize these data to predict the future onset of cardiovascular diseases are important from the perspective of improving prognosis and QOL as well as medical economy. We evaluated the prediction ability of machine learning algorithms for newly developed 5-year chronic kidney disease (CKD). Methods: Among those who underwent examinations from 2008 to 2018, individuals whose baseline (year X), 1 year later (year X+1), and 5 years later (year X+5) data were available were included. CKD was defined as estimated glomerular filtration rate (eGFR) <60mL/min/1.73m2. Those with CKD in year X or X+1 were excluded. Three different machine learning algorithms, Random Forrest (RF), Logistic Regression (LR), and Multilayer Perceptron (MLP) to predict CKD on year X+5 from single-year data (year X) or consecutive 2 years data (year X and X+1) were assessed, and the predictive ability was compared. Anthropometric measurements, blood pressure and pulse rate, laboratory measurements, and lifestyle related indices were included as explanatory variables. Results: A total of 43,084 participants (age 53.7±10.1 years, male 49.9%) were included and 2,928 individuals (6.8%) had CKD in year X+5. The indices of predictive ability from single-year data were as follows: RF, AUC 0.88, accuracy 0.93, recall 0.99, precision 0.93; LR, AUC 0.89, accuracy 0.93, recall 0.99, precision 0.94; MLP, AUC 0.89, accuracy 0.93, recall 0.99, precision 0.93. Those from multi-year data were as follows: RF, AUC 0.93, accuracy 0.95, recall 0.99, precision 0.95; LR, AUC 0.90, accuracy 0.97, recall 0.99, precision 0.97; MLP, AUC 0.89, accuracy 0.97, recall 0.99, precision 0.97. The variables with the highest importance features were serum creatinine (0.41), age (0.13), medication for hypertension (0.04), and systolic blood pressure (0.03). Conclusions: The ability to predict new onset of CKD was better when using data from two consecutive years than when using data from a single year. The AUCs ranged from 0.89 to 0.93 and were generally better than previously reported risk models using statistical methods. Serum creatinine levels, along with age and blood pressure have been identified as major contributors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call