Abstract
BackgroundHemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators.MethodsWe collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model.ResultsThere were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0)12 model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model.ConclusionsThe multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases.
Highlights
Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its out‐ break in various cities in China
Hemorrhagic fever with renal syndrome (HFRS) is a zoonotic disease caused by hantaviruses that cause a high degree of harm to humans
We aim to develop a prediction model for HFRS in mainland China by using one-step and multistep Extreme gradient boosting (XGBoost) models and comparing them with an autoregressive integrated moving average (ARIMA) model
Summary
Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its out‐ break in various cities in China. Hemorrhagic fever with renal syndrome (HFRS) is a zoonotic disease caused by hantaviruses that cause a high degree of harm to humans. Most HFRS cases occur in Asian and. More than 100,000 cases of HFRS occur every year worldwide, and China accounts for more than 90 % of them [1, 2]. The government and health departments have taken on many control measures, such as active rodents control, vaccination implementation, health education implementation, environmental management of the epidemic areas, and disease surveillance strengthening, HFRS still severely affects people’s health with approximately 9,000–30,000 cases annually in China [7]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have