Abstract
BackgroundAdvanced prediction of the daily incidence of COVID-19 can aid policy making on the prevention of disease spread, which can profoundly affect people's livelihood. In previous studies, predictions were investigated for single or several countries and territories.ObjectiveWe aimed to develop models that can be applied for real-time prediction of COVID-19 activity in all individual countries and territories worldwide.MethodsData of the previous daily incidence and infoveillance data (search volume data via Google Trends) from 215 individual countries and territories were collected. A random forest regression algorithm was used to train models to predict the daily new confirmed cases 7 days ahead. Several methods were used to optimize the models, including clustering the countries and territories, selecting features according to the importance scores, performing multiple-step forecasting, and upgrading the models at regular intervals. The performance of the models was assessed using the mean absolute error (MAE), root mean square error (RMSE), Pearson correlation coefficient, and Spearman correlation coefficient.ResultsOur models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Of the 215 countries and territories under study, 198 (92.1%) had MAEs <10 and 187 (87.0%) had Pearson correlation coefficients >0.8. For the 215 countries and territories, the mean MAE was 5.42 (range 0.26-15.32), the mean RMSE was 9.27 (range 1.81-24.40), the mean Pearson correlation coefficient was 0.89 (range 0.08-0.99), and the mean Spearman correlation coefficient was 0.84 (range 0.2-1.00).ConclusionsBy integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead.
Highlights
COVID-19, a highly infectious disease with serious clinical manifestations, was first reported in China in late 2019 and spread to other countries within weeks [1,2]
We aim to develop an efficient and novel methodology for real-time prediction of COVID-19 activity based on the previous daily incidence of COVID-19 and infoveillance data in all individual countries and territories worldwide
There were two main reasons for selecting these 14 terms for Google Trends data collection: (1) Previous studies [6,10,11,12,13] have shown that the internet search data for these terms were correlated with the incidence of COVID-19
Summary
COVID-19, a highly infectious disease with serious clinical manifestations, was first reported in China in late 2019 and spread to other countries within weeks [1,2]. Severe outbreaks, occurred and case numbers took a long time to decrease In other regions, it took more time for the spread of the disease to affect people. Results: Our models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Conclusions: By integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.