Abstract Background and Aims We aim to derive and validate a score estimating the probability of identifying a variant explicative of the renal phenotype using Exome Sequencing (ES) in adults with CKD of uncertain origin. Method Participants: prospective cohort study including all consecutive index patients with a CKD from uncertain origin in three French nephrology units (metropolitan: Sorbonne University Hospitals, Paris, and Conception Hospital, Marseille; overseas: La Réunion University Hospital, Réunion island) who underwent ES between October 11th, 2017 and May 31st, 2023. Outcome measure: identification of a causal variant using ES data, according to the American College of Medical Genetics and Genomics diagnostic criteria. Candiate variables and feature engineering: raw data regarding patients’ phenotypes was prospectively entered in a collection form at the time of ES by the prescribing physician. Structured data (yes/no questions and lists) was mapped to Human Phenotype Ontology (HPO) terms. Unstructured data (free text) was manually translated into HPO terms by the same investigator (nephrologist), which was blinded to ES results. First, second and third-order ancestor HPO terms were included in the analysis. Score's derivation: an optimal weighted average of multiple models (“ensemble model”) was specified, following guidelines provided in a recently published article (PMID:36905602). Score's validation: internal validation using 10-fold cross validation, followed by internal-external validation (data from 2/3 centers was used to develop a model which was then tested using data from the remaining center; this process was repeated in each center). Results We included 2,490 patients, 560/2,490 (22.5%) of whom had a causal variant identified by ES. We collected 1,028 distinct HPO terms describing the patients’ phenotypes. The most common term was “Hypertension”, occurring in 2,079/2,490 (83.5%) patients. In internal validation, the score showed accurate calibration and discrimination (area under the receiver operating characteristics curve (AUC): 0.71, 95% CI 0.68 to 0.73; index of precision accuracy (IPA): 0.11, 95% CI 0.09 to 0.13). Performances were moderately diminished in internal-external validation but remained satisfactory (Tenon: AUC 0.64, IPA 0.04; Marseille: AUC 0.67, IPA 0.04; La Réunion: 0.68, IPA 0.09). Conclusion We derived, internally and internally-externally validated a clinical score which accurately predicts the probability of obtaining a genetic diagnosis using Exome Sequencing. Feature engineering could further increase the score's performances. Automatic computation by clinicians using an interactive tool is achievable and could be of clinical utility, especially to guide resource allocation in case of a low pretest probability.
Read full abstract