Abstract

Predicting the number of new suspected or confirmed cases of novel coronavirus disease 2019 (COVID-19) is crucial in the prevention and control of the COVID-19 outbreak. Social media search indexes (SMSI) for dry cough, fever, chest distress, coronavirus, and pneumonia were collected from 31 December 2019 to 9 February 2020. The new suspected cases of COVID-19 data were collected from 20 January 2020 to 9 February 2020. We used the lagged series of SMSI to predict new suspected COVID-19 case numbers during this period. To avoid overfitting, five methods, namely subset selection, forward selection, lasso regression, ridge regression, and elastic net, were used to estimate coefficients. We selected the optimal method to predict new suspected COVID-19 case numbers from 20 January 2020 to 9 February 2020. We further validated the optimal method for new confirmed cases of COVID-19 from 31 December 2019 to 17 February 2020. The new suspected COVID-19 case numbers correlated significantly with the lagged series of SMSI. SMSI could be detected 6–9 days earlier than new suspected cases of COVID-19. The optimal method was the subset selection method, which had the lowest estimation error and a moderate number of predictors. The subset selection method also significantly correlated with the new confirmed COVID-19 cases after validation. SMSI findings on lag day 10 were significantly correlated with new confirmed COVID-19 cases. SMSI could be a significant predictor of the number of COVID-19 infections. SMSI could be an effective early predictor, which would enable governments’ health departments to locate potential and high-risk outbreak areas.

Highlights

  • A novel coronavirus, COVID-19, has emerged over the last few weeks since its outbreak in Wuhan City, China [1,2,3,4,5]

  • This study investigated the correlation between the number of new cases of COVID-19 and the search index for a popular social network in China, Baidu search index (BSI), as the reference Social media search indexes (SMSI)

  • We display the positive correlation between the series of new suspected COVID-19 cases and the lagged series of five keywords in BSI (Table 1)

Read more

Summary

Introduction

A novel coronavirus, COVID-19 (formally known as 2019-nCoV), has emerged over the last few weeks since its outbreak in Wuhan City, China [1,2,3,4,5]. This severe acute respiratory syndrome (SARS)-like virus has infected over 75,000 people and killed over 2000 in China [1,2,3,4,5]. Increasing numbers of cases have been reported in other countries across all continents except Antarctica, and the rate of new cases outside of China has outpaced the rate in China. In the United States, clusters of COVID-19 with local transmission have been identified throughout most of the country [6,7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call