Abstract

With lifestyle and environmental changes, the prevalence of cardiovascular diseases (CVDs) is trending upwards, putting pressure on the limited medical resources. Accurate forecasting of daily counts of hospital admissions (HAs) for CVDs is helpful to optimize medical resources. In this study, we proposed a stacking ensemble model with direct prediction strategy to predict the daily number of CVDs admissions using HAs data, air pollution data, and meteorological data. The sequential forward floating selection method with early stopping was applied for feature selection. Five machine learning models, including linear regression (LR), support vector regression (SVR), extreme gradient boosting (XGBoost), random forest (RF), and gradient boosting decision tree (GBDT), were utilized as base learners to construct the stacking model. We compared the performance of the proposed stacking model with the five base learners in three datasets. The experimental results indicated that our model performed best in three datasets under four evaluation criteria, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). Particularly, in the CVDs dataset, the MAPE is 15.103 for LR, 11.862 for SVR, 10.571 for XGBoost, 10.378 for GBDT, 10.333 for RF, and 9.679 for the stacking model. Compared with the best base learner RF, the MAPE, RMSE, and MAE of the stacking model decreased by 6.3%, 7.4%, and 6.3%, respectively, and the R2 improved by 1.7%. It is evident that the proposed stacking model can effectively forecast the daily number of hospitalizations for CVDs and provide decision support for hospital managers.

Highlights

  • According to the report released by the World Health Organization (WHO), 17.9 million people die each year from cardiovascular diseases (CVDs), an estimated 31% of all worldwide deaths [1]

  • Petinrin and Saeed [25] designed a stacking model, including support vector classification (SVC), decision tree (DT), K-nearest neighbor (KNN), and random forest (RF), to predict bioactive molecules, and the stacking model achieved the best performance compared with other ensemble learning models such as adaptive boosting (Adaboost), bagging [26] and vote ensemble

  • Support vector regression (SVR), linear regression (LR), RF, gradient boosting decision tree (GBDT), and XGBoost were trained as the first stage models, and the predictions of these base learners combined with some crucial features were selected to train a meta learner in order to make the final forecasting

Read more

Summary

Introduction

According to the report released by the World Health Organization (WHO), 17.9 million people die each year from cardiovascular diseases (CVDs), an estimated 31% of all worldwide deaths [1]. On the other hand, increasing evidence has suggested that environmental exposures such as ambient air pollution [3] and temperature variability [4]. Contribute to CVDs onsets, which will further increase the prevalence of CVDs. For example, Chen et al [3] conducted a multi-city analysis in southwestern China using a generalized additive model (GAM). Chen et al [3] conducted a multi-city analysis in southwestern China using a generalized additive model (GAM) They found that the hospital admissions (HAs) for CVDs were associated with exposure to coarse particulate matter (PMC, particles with an aerodynamic diameter between 2.5 and 10) pollution. Another study conducted in 184 Chinese cities linked temperature variability to the increase of HAs for overall and cause-specific CVDs [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call