Abstract Background and Aims To improve risk stratification in kidney transplantation, molecular diagnostic tools are increasingly investigated for their feasibility and gain in diagnostic power. Nonetheless, previously published studies focused on sequencing technologies and gene panels with limited vision of the transcriptome (pathogenesis-based transcripts [PBTs], Banff Human Organ Transplant [BHOT] panel). The EU-TRAIN consortium was built to discover new predictive and informative biomarkers for kidney transplant histology and rejection diagnoses, leveraging a next generation sequencing technology. Method EU-TRAIN (NCT03652402) is a prospective multicentric study including unselected kidney transplant cohorts from 11 centres from 4 countries (France, Spain, Germany, Switzerland). We performed a bulk RNA sequencing on the polyadenylated probes of 770 kidney biopsies (n = 540 kidney recipients) collected between 2018 and 2021. Differential gene expression analyses were computed to obtain a molecular signature for all Banff score lesions. We then derived three different feature selections by either i) training an ElasticNet model on all differentially-expressed genes (DEGs), or by taking the top 30 ii) overall DEGs or iii) the top 30 DEGs focusing on transcripts included in the BHOT gene panel. From them, we trained four machine learning (ML) classifiers through a 10-times repeated 3-fold cross-validation. Models’ performances were assessed on a hold-out test set accounting for 30% of the total samples. Finally, we derived prototypic histological profiles using an archetypal analysis on the samples’ predicted Banff score probabilities from the best classifier. Results The ElasticNet feature selection lowered the number of DEGs to be included from a range of [859; 10,839] to [52; 867]. These selections were composed of a mixture of PBTs [55.2%; 84.2%], BHOT genes [2.2%;13.5%] and new transcripts [12.4%; 37.2%]. Based on these findings, four ML classifiers (Naïve Bayes, Extreme Gradient Boosting, Linear Support Vector Machine and K-Nearest Neighbours) were trained on the three different feature selections and their performances in predicting Banff score lesions were compared using the precision-recall area under the curve (PRAUC). In all settings, the ElasticNet feature selection outperformed the other two methods with a minimal/maximal increase in PRAUC of 0.068 (t) / 0.726 (cg). Excluding the ah score, the best discriminations were obtained with the Linear Support Vector Machine with a PRAUC in the interval [0.708; 0.980] (t/ptc). Excluding cv and ah, all models calibrated properly (Hosmer and Lemeshow goodness of fit p-value > 0.05). The archetypal analysis resulted in 8 profiles: acute and chronic antibody-mediated rejection (presence of circulating donor-specific antibodies, C4d deposition, g, ptc and cg lesions), acute T-cell mediated rejection (i, t and ti lesions), chronic TCMR (i, t, ti, ci, ct and i-IFTA lesions), mixed rejection (g, ptc, i, t and ti lesions), vascular injuries (cv and ah), fibrosis (ci and ct lesions, older donors with history of hypertension), minimal fibrotic change (ci and ct) and minor changes (no lesions). Conclusion From new transcripts, we managed to develop models that predict accurately the Banff score lesions and 8 profiles were identified among these predictions. External validation and archetypes’ association with graft loss will be addressed in the future.
Read full abstract