Abstract 13880: Natural Language Processing for Adjudication of Heart Failure Hospitalizations in a Multi-Center Clinical Trial

Jonathan Cunningham,Peter Finn,Pablo-Miki Marti Castellote,Anthony A Philippakis,Emily Lau,Puneet Batra,Akshay S Desai,Brian Claggett,Orly Vardeny,Jennifer E Ho,Patrick T Ellinor,Steven Lubitz,Mahnaz Maddah,Christopher Reeder,Pulkit Singh,Shaan Khurshid,Scott Solomon

doi:10.1161/circ.148.suppl_1.13880

Abstract

Background: The gold standard for outcome adjudication in clinical trials is chart review by a physician clinical events committee (CEC), which requires substantial time and expertise. Automated adjudication by natural language processing (NLP) may be a more resource-efficient alternative. We previously developed the Community Care Cohort Project (C3PO) NLP model to adjudicate heart failure (HF) hospitalization within one healthcare system. Aim: To externally validate the C3PO NLP model against CEC adjudication in the INVESTED trial Methods: The INVESTED trial evaluated influenza vaccination in 5260 patients with cardiovascular disease at 157 North American sites. A central CEC of 21 physicians adjudicated the cause of hospitalizations from medical records. We applied the C3PO NLP model to 4060 INVESTED medical record dossiers and evaluated agreement between the NLP and final consensus CEC HF adjudications by kappa statistic, sensitivity and specificity. We fine-tuned the C3PO NLP model (C3PO+INVESTED) and trained a de novo model in half the INVESTED hospitalizations, and evaluated these models in the other half. NLP performance was benchmarked to CEC reviewer inter-rater reproducibility. Results: 1074 hospitalizations (26%) were considered HF by the CEC. There was high agreement between the C3PO NLP and CEC HF adjudications (agreement 87%, kappa 0.69). C3PO NLP model sensitivity was 94% and specificity was 84%. The fine-tuned C3PO and de novo NLP models demonstrated agreement of 93% and kappa of 0.82 and 0.83, respectively. CEC inter-rater reproducibility was 94% (kappa 0.85). Conclusion: In external validation, a single-center NLP model for HF adjudication was accurate relative to the gold-standard CEC in an external multi-center clinical trial. Fine-tuning the model improved agreement to the level of human reproducibility. NLP may improve the efficiency of future multi-center clinical trials by accurately adjudicating clinical events at scale.

Full Text