Abstract

Abstract Funding Acknowledgements None. Introduction Body mass index (BMI) has been associated with long-term survival in the setting of acute pulmonary embolism (PE) [1,2]. The non-linear pattern of this association proves to be challenging for frequentist statistical analyses. Contemporary machine learning models offer superior handling capabilities of datasets with non-linear data distributions. Purpose To examine the predictive power of BMI expressed as a continuous variable in a dataset of patients with pulmonary embolism (PE) using the extreme gradient boosting (XGBoost) machine learning framework. Methods All consecutive patients with confirmed PE admitted to the two tertiary teaching Clinical Hospitals in the metropolitan area from November 2013 until November 2018 were included in the study. The diagnosis was established by multislice computed tomography angiography. Machine learning models were investigated using the LightGBM variant of XGBoost. We allocated 80% of the dataset for model training and the remaining 20% for testing. Two models were tested: (1) a model with 20 variables tailored according to the Cox proportional hazards model in the initial publication; (2) a model with 11 independent predictors of outcome. Results The study population is described in detail in previous publications [1,2]. In brief, a total of 761 patients were included in the study. The population was predominantly female (57.4%), aged 73 (61-81). The median overall follow-up was 675 days (114-1331). There were 335 (44.0%) cases of death recorded at follow-up. A Cox proportional hazards model with 19 covariates significantly predicted death during follow-up (χ2 = 595.5, p<0.001). The first BMI quartile was an independent predictor of death (hazard ratio 1.66, 95% confidence interval 1.31-2.10). However, when expressed as a continuous variable, BMI failed to enter the model. In both XGBoost models, BMI expressed as a continuous variable proved to be a feature with high predictive power, second only to pulmonary embolism severity index score (Figure 1 and 2). The models showed high accuracy, with an area under curve of 0.840 and 0.864 respectively. Conclusion In the XGBoost analysis, a machine learning framework with superior handling of non-linear data, BMI expressed as a continuous variable emerged as the second strongest predictor of outcome among 761 PE patients.Figure 1Figure 2

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call