Background: Medical record review by a physician clinical events committee is the gold standard for identifying cardiovascular outcomes in clinical trials, but is labor-intensive and poorly reproducible. Automated outcome adjudication by artificial intelligence (AI) could enable larger and less expensive clinical trials, but has not been validated in global studies. Methods: We developed a novel model for automated AI-based heart failure adjudication ("HF-NLP") using hospitalizations from three international clinical outcomes trials. This model was tested on potential heart failure hospitalizations from the DELIVER trial, a cardiovascular outcomes trial comparing dapagliflozin with placebo in 6063 patients with heart failure with mildly reduced or preserved ejection fraction. AI-based adjudications were compared with adjudications from a clinical events committee that followed FDA-based criteria. Results: AI-based adjudication agreed with the clinical events committee in 83% of events. A strategy of human review for events that the AI model deemed uncertain (16%) would have achieved 91% agreement with the clinical events committee while reducing adjudication workload by 84%. The estimated effect of dapagliflozin on heart failure hospitalization was nearly identical with AI-based adjudication (hazard ratio 0.76 [95% CI 0.66-0.88]) compared to clinical events committee adjudication (hazard ratio 0.77 [95% CI 0.67-0.89]). The AI model extracted symptoms, signs, and treatments of heart failure from each medical record in tabular format and quoted sentences documenting them. Conclusions: AI-based adjudication of clinical outcomes has the potential to improve the efficiency of global clinical trials while preserving accuracy and interpretability.
Read full abstract