Abstract Background and Aims ADPKD is an inherited disorder characterized by the development of cysts in both kidneys, leading to renal enlargement and a gradual decline in renal function. It is the leading cause of end stage renal disease (ESRD) with an estimated prevalence of 1:1000. The estimated glomerular filtration rate (eGFR) is the calculated estimation of kidney function by using different equations. The eGFR equation used in this study is Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI, 2009) equation, which includes serum creatinine level, age, and gender. eGFR is also frequently used for assessing overall health of ADPKD patients, classifying the disease stages, and determining appropriate therapeutic interventions. As in ADPKD, clinical symptoms manifest typically in later disease stages, early diagnosis is crucial to slow the disease progression and improve patients’ life quality. This study has two main objectives: (1) the identification of molecular and clinical parameters associated with the annual eGFR decline and (2) the development of a predictive model for future eGFR in patients with ADPKD using serum proteomics. This research aims to advance more effective strategies for the misdiagnosis, stratification, and treatment of ADPKD by combining proteomic and clinical parameters. Method Utilizing retrospective eGFR values (n = 7187) of patients with ADPKD (n = 1291) recruited from the AD(H)PKD patient cohort, the annual decline in eGFR (slope) was calculated. A minimum of three eGFR values and the absence of tolvaptan use (a medication that inhibits the mechanisms underlying eGFR decline) were inclusion criteria. In order to determine the slope, robust linear regression was applied. Prior to the analyses, patients whose slopes were clinically extreme (−10 & >5 mL/min/1.73 m2 per year) were excluded. Total of 2469 eGFR values from 219 ADPKD patients were included in this study. To derive a feature set (FS) suitable for slope prediction, the least absolute shrinkage and selection operator (LASSO) was applied independently to two subsets of the serum proteome (SP) data: (i) SP alone and (ii) SP in conjunction with key clinical variables: age, gender, eGFR, and MAYO class (SPMC). Linear regression (LR) models were constructed using the acquired FSs individually (SP and SPMC), and cross-validation (CV) was incorporated to prevent overfitting. Following that, prediction models were generated and compared in terms of their ability to predict slopes and were visualized for an overview. Results Using feature selection, four features associated with slope were identified in both SP and SPMC data. Three of those features were proteomic-based and common to both feature sets (SC and SPMC). LR models exhibited stability in R2s in test sets of CV, indicating no sign of overfitting. Therefore, the LR slope prediction models were built with SP and SPMC feature sets by using all available data. The adjusted R2 of SP and SCMP LR models were 0.274 (R2 = 0.287, F(4,213) = 21.48, p = 6.74 × 10−15) and 0.301 (R2 = 0.317, F(5,208) = 19.31, p = 8.94 × 10−16). This step was followed by the slope prediction of the same patients (internal validation) to examine models’ prediction capacity by comparing the predicted to observed slopes. According to comparisons, our LR models predict closer to actual slope if slope is greater than −5 and lower than 0 mL/min/1.73 m2 per year. Most of the observed slopes were in between the specified thresholds, which explains why the model is working well. However, the deviations between predicted and the observed slopes increases as the slopes exceed this range. Conclusion Identified biomarkers could be used for how the disease will progress, however, fine-tuning of the models might be necessary. Even though there were patients with positive and extremely negative eGFR slopes, our model accurately predicts slopes in the previously specified range. These extreme cases might also be responsible for models’ reduced accuracy outside of that range. One possible solution could be to use weighted lasso to reduce the effect of “extreme” slopes on the models.
Read full abstract