Developing and optimizing a computable phenotype for incident venous thromboembolism in a longitudinal cohort of patients with cancer

Ang Li,Wilson L Da Costa,Danielle Guffey,Emily M Milner,Anthony K Allam,Karen M Kurian,Francisco J Novoa,Marguerite D Poche,Raka Bandyo,Carolina Granada,Courtney D Wallace,Neil A Zakai,Christopher I Amos

doi:10.1002/rth2.12733

Abstract

BackgroundResearch on venous thromboembolism (VTE) that relies only on the International Classification of Diseases (ICD) can misclassify outcomes. Our study aims to discover and validate an improved VTE computable phenotype for people with cancer. MethodsWe used a cancer registry electronic health record (EHR)–linked longitudinal database. We derived three algorithms that were ICD/medication based, natural language processing (NLP) based, or all combined. We then randomly sampled 400 patients from patients with VTE codes (n = 1111) and 400 from those without VTE codes (n = 7396). Weighted sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated on the entire sample using inverse probability weighting, followed by bootstrapped receiver operating curve analysis to calculate the concordance statistic (c statistic). ResultsAmong 800 patients sampled, 280 had a confirmed acute VTE during the first year after cancer diagnosis. The ICD/medication algorithm had a weighted PPV of 95% and a weighted sensitivity of 81%, with a c statistic of 0.90 (95% confidence interval [CI], 0.89–0.91). Adding Current Procedural Terminology codes for inferior vena cava filter removal or early death did not improve the performance. The NLP algorithm had a weighted PPV of 80% and a weighted sensitivity of 90%, with a c statistic of 0.93 (95% CI, 0.92–0.94). The combined algorithm had a weighted PPV of 98% at the higher cutoff and a weighted sensitivity of 96% at the lower cutoff, with a c statistic of 0.98 (95% CI, 0.97–0.98). ConclusionsOur ICD/medication‐based algorithm can accurately identify VTE phenotype among patients with cancer with a high PPV of 95%. The combined algorithm should be considered in EHR databases that have access to such capabilities.

Full Text