Abstract Background Estimation of left ventricular filling pressures (LVFP) is important for accurate diagnosis and prognosis of heart failure (HF). The current ASE/EACVI algorithm for identifying elevated LVFP shows modest performance in the setting of preserved ejection fraction (EF) (1). Left atrial reservoir strain (LASr) has been proposed as a novel and additional diagnostic marker to identify elevated LVFP in HF with preserved EF (HFpEF) (1). While machine learning has shown good diagnostic performance utilising conventional echocardiographic findings (2), the incremental value of LASr has not been adequately explored using ML models. Purpose This study aimed to evaluate the performance of a machine learning (ML) model that integrates clinical and echocardiographic data including LASr to identify elevated LVFP in the setting of suspected HF with preserved EF. Methods Patients with dyspnoea, EF ≥50% and sinus rhythm undergoing right heart catheterisation (RHC) and near-simultaneous echocardiography were selected from the KARUM database. Patients with atrial fibrillation, pacemakers, significant valvular disease or infiltrative cardiomyopathy were excluded. Elevated LVFP was defined as invasive pulmonary artery wedge pressure (PAWP) ≥15 mmHg. The dataset was split into 75% for model development (training set) and 25% for unbiased testing of model performance (test set). Missing numerical and categorical variables were imputed based on the training data using Bayesian Ridge regression and K-Nearest Neighbours, respectively. Recursive Feature Elimination (RFE) was used to reduce the number of variables. A comprehensive grid search with repeated 10-fold cross-validation was employed to optimise hyperparameters for an XGBoost (Xtreme Gradient Boosting) ML model. The model with the highest mean area under the receiver operating characteristic curve (AUC) across cross-validation folds was reported and selected as the final model, with its performance reported on the independent test set. Finally, we applied SHapley Additive exPlanations (SHAP) to evaluate and rank the individual variable importance. Results Of 210 patients, 157 were allotted for model development (training set) and 53 for reporting model performance (test set). Ten significant variables were identified through RFE. The best-performing XGBoost model achieved a mean AUC of 86.6 % across the cross-validation folds and maintained good diagnostic performance in the test set with an AUC of 89.4% (95% CI: 79.1-99.7%) (Fig 1). Applying SHAP ranked the variables and found LASr the most important variable (Fig 2). Conclusion An ML model based on clinical and echocardiographic data, including LASr, demonstrated good diagnostic performance in identifying elevated LVFP in the setting of suspected HF with preserved EF. ROC Curve for XGBoost Model SHAP summary plot
Read full abstract