Abstract Background and Aims Albuminuria is a medical condition characterized by leakage of albumin into the urine due to kidney damage. The condition is diagnosed by heightened Albumin to Creatinine ratio in the urine (ACR) and is typically categorized into microalbuminuria (ACR ≥ 30, < 300 mg/g mg/g) and macroalbuminuria (ACR ≥ 300 mg/g). Currently, despite compelling data, only a minority of patients with diabetes, and rarely individuals without diabetes, are screened for albuminuria in a systematic way. ACR tests also present high intra-person variability, making it harder to identify meaningful biologic changes and increasing complexity of clinical trials. In this study we develop a method using machine learning to predict ACR level from Electronic Health Records, excluding urine tests, and validated the model for identifying patients with albuminuria. Identifying patients with undiagnosed albuminuria could help slow progression of kidney disease and be used to speed up recruitment to, and reduce screen failures in, clinical trials. Method We developed a Quantile Regression model for ACR using US Limited IBM MarketScan Explorys Claims-EMR Data Set (LCED). We included subjects who had an ACR test (0 < ACR ≤ 5000 mg/g) and was of at least 18 years of age. Patient demographics (age, sex), vital signs (BMI, Blood Pressure) and 8 common blood tests (Albumin, Bilirubin, Creatinine, HbA1C, Triglyceride, Glucose, White Blood Cell count & ALT) were used as covariates. A tree based gradient boosting framework (LightGBM) was used to train the quantile regression models, viz., the 25-, 50-, and 75-percentiles of ACR conditioned on the covariates. The model was then validated on all qualified subjects in Optum's de-identified Clinformatics® Data Mart Database (2007-2021). We evaluated performance using the metrics Area Under the Curve (AUC), precision (PPV), specificity (TNR), and sensitivity (TPR). Finally, we use Kaplan-Meier estimates to compare the risk of progression to kidney failure as identified by ICD codes (Chronic kidney disease, stage 5; End stage renal disease; Dependence on renal dialysis; Unspecified kidney failure; Kidney transplant) of both the predicted and measured ACR values. Results A final cohort of 63,459 individuals matched the inclusion and exclusion criteria in LCED and 5,857,385 individuals in Optum. Using the 25% quantile to predict patients at risk, the model consistently reaches a PPV greater than 0.8 and a specificity (TNR) greater than 0.99 (Table 1). The risk of progression to kidney failure increases with both increased predicted and measured ACR (Figure 1). Conclusion The results show that the models have discriminative power in all datasets. It predicts both micro- and macro-albuminuria with a PPV above 80% for the 25-quartile. However, classification performance is lacking in sensitivity, i.e., subjects suffering from albuminuria may not be classified as such. By using the median prediction of ACR we identify patient subpopulations that have a risk of kidney failure at least on par with the true ACR subpopulation targeted. This means that the method can be used confidently to identify at-risk individuals. Therefore, our model is advantageous in applications such as identifying undiagnosed albuminuria and pre-screening for clinical trials, where high PPV is more important than sensitivity. We intend to validate the model further for outcomes prediction in an upcoming CKD trial.
Read full abstract