Development and validation of algorithms to predict left ventricular ejection fraction class from healthcare claims data.

Damien Logeart,Maxime Doublet,Margaux Gouysse,François Roubille,Richard Isnard,Thibaud Damy

doi:10.1002/ehf2.14725

Abstract

The use of large medical or healthcare claims databases is very useful for population-based studies on the burden of heart failure (HF). Clinical characteristics and management of HF patients differ according to categories of left ventricular ejection fraction (LVEF), but this information is often missing in such databases. We aimed to develop and validate algorithms to identify LVEF in healthcare databases where the information is lacking. Algorithms were built by machine learning with a random forest approach. Algorithms were trained and reinforced using the French national claims database [Système National des Données de Santé (SNDS)] and a French HF registry. Variables were age, gender, and comorbidities, which could be identified by medico-administrative code-based proxies, Anatomical Therapeutic Chemical codes for drug delivery, International Classification of Diseases (Tenth Revision) coding for hospitalizations, and administrative codes for any other type of reimbursed care. The algorithms were validated by cross-validation and against a subset of the SNDS that includes LVEF information. The areas under the receiver operating characteristic curve were 0.84 for the algorithm identifying LVEF≤40% and 0.79 for the algorithms identifying LVEF<50% and ≥50%. For LVEF≤40%, the reinforced algorithm identified 50% of patients in the validation dataset with a positive predictive value of 0.88 and a specificity of 0.96. The most important predictive variables were delivery of HF medication, sex, age, hospitalization, and testing for natriuretic peptides with different orders of positive or negative importance according to the LVEF category. The algorithms identify reduced or preserved LVEF in HF patients within a nationwide healthcare claims database with high positive predictive value and low rates of false positives.

Full Text