Abstract Background/Introduction The cardio-pulmonary exercise test (CPET) has been a pivotal tool for functional and prognostic evaluation in adults with congenital heart disease (ACHD). During a routine CPET test, a myriad of variables are collected reflecting the cardiovascular, pulmonary, and skeletal muscle systems. However, clinicians typically use only a few easily and accessible variables to establish prognosis, which can come at an accuracy cost. Machine learning algorithms have undergone significant development in recent years and have found applications in the medical field. One of their main advantages is the ability to handle a large number of variables and extract information from their various combinations, while excluding redundant data, thus offering a higher level of prediction accuracy. Purpose Our objective was to develop a machine learning model to predict death during follow-up in a large population of ACHD patients who have undergone routine CPET in a large-volume single specialist centre. Methods All available CPET studies for adult ACHD patients (age > 15 years) performed from December 1999 to December 2021 at our centre were included, and all standard available variables were extracted. The primary outcome was all-cause mortality since the CPET, collected up to November 2023. Data exploration, data cleaning, and feature engineering steps were conducted using Python (v3.8). Continuous variables were standardised. The supervised machine learning algorithm XGBoost was used for classification (all-cause mortality), with the best hyperparameters selected through cross-validation. For the final model, a feature importance analysis based on permutation techniques was used to explore the variables with more impact on model accuracy. Results A total of 6361 studies were included, with median age of 31 years (IQR 23-43), 56% male. During follow-up, a total of 491 deaths were recorded. Of available demographic, clinical and exercise parameters (n=129 variables), 21 were deemed sufficient in terms of completeness, absence of significant correlation and clinically relevant for the subsequent analysis. These included demographic variables (age and sex) and CPET variables (peak VO2, FEV1, peak heart rate, exercise time, etc.) and were used to train the model. The accuracy of the model in predicting death after CPET in the test set had an accuracy of 93.5% and an area under the ROC curve (AUC) of 85.0% (Figure1). The parameters with the highest relative feature importance were FEV1, body mass index and percent predicted peak VO2 (Figure 2). Conclusions Machine learning models can be effectively employed in specialised cardiac populations, such as ACHD, potentially providing a high level of prediction accuracy by efficiently harnessing the potential of available data. However, its applicability to mortality prediction for ACHD patients requires further external validation.ROC curve AUC for the model predictionsFeature importance using permutation
Read full abstract