Can machine learning improve patient selection for cardiac resynchronization therapy?

Szu-Yeu Hu,Charlotta Lindvall,Alexander W Forsyth,Josh Haimson,Devvrat Malhotra,James A Tulsky,Daniel B Kramer,Neal A Chatterjee,Enrico Santus,Regina Barzilay,Giuseppe Coppola

doi:10.1371/journal.pone.0222397

Szu-Yeu Hu, Charlotta Lindvall + Show 9 more

Open Access

https://doi.org/10.1371/journal.pone.0222397

Copy DOI

Abstract

RationaleMultiple clinical trials support the effectiveness of cardiac resynchronization therapy (CRT); however, optimal patient selection remains challenging due to substantial treatment heterogeneity among patients who meet the clinical practice guidelines.ObjectiveTo apply machine learning to create an algorithm that predicts CRT outcome using electronic health record (EHR) data avaible before the procedure.Methods and resultsWe applied machine learning and natural language processing to the EHR of 990 patients who received CRT at two academic hospitals between 2004–2015. The primary outcome was reduced CRT benefit, defined as <0% improvement in left ventricular ejection fraction (LVEF) 6–18 months post-procedure or death by 18 months. Data regarding demographics, laboratory values, medications, clinical characteristics, and past health services utilization were extracted from the EHR available before the CRT procedure. Bigrams (i.e., two-word sequences) were also extracted from the clinical notes using natural language processing. Patients accrued on average 75 clinical notes (SD, 29) before the procedure including data not captured anywhere else in the EHR. A machine learning model was built using 80% of the patient sample (training and validation dataset), and tested on a held-out 20% patient sample (test dataset). Among 990 patients receiving CRT the mean age was 71.6 (SD, 11.8), 78.1% were male, 87.2% non-Hispanic white, and the mean baseline LVEF was 24.8% (SD, 7.69). Out of 990 patients, 403 (40.7%) were identified as having a reduced benefit from the CRT device (<0% LVEF improvement in 25.2%, death by 18 months in 15.6%). The final model identified 26% of these patients at a positive predictive value of 79% (model performance: Fβ (β = 0.1): 77%; recall 0.26; precision 0.79; accuracy 0.65).ConclusionsA machine learning model that leveraged readily available EHR data and clinical notes identified a subset of CRT patients who may not benefit from CRT before the procedure.

Highlights

Cardiac resynchronization therapy (CRT) is an established therapy for patients with medically refractory systolic heart failure and left ventricular dyssynchrony[1,2,3,4,5]
A machine learning model that leveraged readily available Electronic Health Records (EHR) data and clinical notes identified a subset of CRT patients who may not benefit from CRT before the procedure
Machine learning for CRT outcome prediction million billing diagnoses, 105 million medications, 200 million procedures, 852 million lab values, and over 5 million unstructured clinical notes, which include outpatient visit notes, inpatient admission and consultation notes, cardiology reports, and others

Summary

Introduction

Cardiac resynchronization therapy (CRT) is an established therapy for patients with medically refractory systolic heart failure and left ventricular dyssynchrony[1,2,3,4,5]. Improvement of left ventricular ejection fraction (LVEF) following CRT implant is associated with a reduction in heart failure hospitalizations and improved survival. Despite these established benefits, at least one-third of CRT patients do not experience an improvement in LVEF 6–18 months following the procedure[6]. At least one-third of CRT patients do not experience an improvement in LVEF 6–18 months following the procedure[6] Another subgroup of patients die from heart failure or other comorbidities before the effectiveness of CRT can be measured. Machine learning algorithms that process thousands or even millions of variables hold the promise to improve on both the precision and usability of existing prediction models

Methods

Results

Conclusion