Abstract Background and Aims The relapsing-remitting, multi-system pattern of disease in ANCA vasculitis (AAV) results in incremental tissue injury. For those with renal involvement, there is a 9-fold increased risk of end-stage kidney disease after renal relapse. Relapse is defined by the Birmingham Vasculitis Activity Score (BVAS v3) >0, particularly in the clinical trial setting. However, this metric may be missing or incorrectly scored in real world registry data, resulting in incomplete or inaccurate ascertainment of this key outcome. Our aim was the development, internal validation and evaluation of a pragmatic data-driven algorithm to automate the retrospective identification of AAV relapse in real-world data. Method The Rare Kidney Disease (RKD) Registry is a national longitudinal, multi-centre cohort study, including 663 patients with AAV, of whom those with >6 months follow up post diagnosis were eligible for inclusion. We followed five steps to develop and validate the algorithm: 1) independent expert adjudication of encounters using primary medical record information to assign the reference probability of relapse (ground truth), 2) selection of data elements and corresponding value sets using literature review, expert opinion and with a consideration of likely data availability, 3) development of a computable phenotype definition, with an embedded logistic multi-level regression model using complete case analysis, 4) internal validation, 5) development of additional models (using the same method) to account for combinations of variable missingness (models described in Fig. 1). We also developed a Shiny web application to implement the final algorithm, which determines the appropriate model based on available variables, outputting an individualised probability of relapse, with a suggested binary interpretation. Results In the first step of the algorithm, encounters with diagnostic histopathology were labelled as relapse. For encounters without histopathological confirmation, we selected five objective data elements to build the model: change in ANCA level, suggestive blood/urine tests, suggestive imaging, immunosuppressive (IS) status at the time of the encounter and the change of this IS in response (‘IS response’) (Fig. 2). Development and validation datasets comprised 1209 and 377 separate encounters, respectively. An optimal cut-point of 0.48 was determined by maximising the F1-Score (0.85) for the complete 5-variable model. Sensitivity and specificity were 0.91 and 0.95 respectively. Performance metrics were stable across fifty random-split resamples. Calibration-in-the-large was satisfied. Where ‘IS response’ was missing, ‘suggestive bloods/urine’ (Data Element [DE]2) with at least either ‘ANCA level’ (DE1) or ‘suggestive imaging’ (DE3) was required to achieve an accuracy as good as gold standard BVAS (Fig. 1). Conclusion In settings where accurate BVAS may not be available, this algorithm accurately quantifies the individualised probability of AAV relapse using objective, readily accessible registry data. In addition to our web application, the model can be directly embedded in a registry database. This framework could serve as an exemplar for other relapsing-remitting diseases and for automating the identification of other key outcomes or cohorts in registry data.
Read full abstract