BackgroundIt is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu’s search volume, in the early warning and predicting the epidemic trend of COVID-19.MethodsThe daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable.ResultsThe Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of “Influenza” and “Pneumonia” in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of “SARS”, “Pneumonia”, “Coronavirus” in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while “Influenza” changed to negative correlated (rs: -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of “COVID-19”, “Pneumonia”, “Coronavirus”, “SARS” and “Mask” could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February.ConclusionThe Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.
Read full abstract