Abstract
Abstract Background Many machine learning models exist, including Multilayer Perceptron (MLP), Random Forest algorithm (RF), Support Vector Machine (SVM), and Gradient Boosted Machine (GBM), but their value for predicting outcome in patients with heart failure has not been compared. Aim To predict rehospitalisation (all-cause) and death (all-cause) at 1-, 3- and 12 months after discharge from a first hospitalisation for heart failure using four machine learning models. Methods The National Health Service Greater Glasgow and Clyde Health Board serves a population of ∼1.1 million. We obtained de-identified administrative data, including investigations, diagnosis and prescriptions, linked to hospital admissions and deaths for anyone with a diagnosis of vascular disease or heart failure or prescribed loop diuretics, statins or neuro-endocrine antagonists at any time between 1st January 2010 and 1st June 2018. Patients who were under 18 or had no prior hospitalisation for heart failure were excluded. Four ML algorithms using 46 variables were applied. Results Of 360,000 people who met the above criteria between 2010–2018, 6,372 had a hospitalisation for heart failure prior to 1st January 2010 and 8,304 had a first hospitalisation for heart failure thereafter. Between 2010 and 2018 there were 3,086 re-hospitalisations over 24 hours and 3,706 patients died, with 5,070 patients experiencing the composite outcome. GBM and RF consistently outperformed MLP and SVM when comparing AUC, sensitivity and specificity combined, with GBM performing best in all scenarios. Since GBM and RF are both tree-based models, and with SVM and MLP regularly reporting very poor sensitivity or specificity despite a similar AUC to the others, this suggests that SVM and MLP may be suffering from overfitting and might perform better in larger data-sets. Both GBM and RF work by ordering variables, so the final model can be used to determine the most important prediction variables. Age, number of times a blood sample was taken out of hospital, length of stay, social deprivation index and haemoglobin concentration consistently ranked amongst the most important variables. Models predicted all 1-month events better than later events. Conclusions Some, but not all, ML models applied to this data-set predicted rehospitalisation and death with great accuracy for up to 3 months after a first hospitalisation for heart failure. The models identified several important prognostic variables that are currently seldom collected in clinical research registries but perhaps should be. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Medical Research Council
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.