Abstract

China’s reported cases of Human Immunodeficiency Virus (HIV) and AIDS increased from over 50000 in 2011 to more than 130000 in 2017, while AIDS related search indices on Baidu from 2.1 million to 3.7 million in the same time periods. In China, people seek AIDS related knowledge from Baidu which one of the world’s largest search engine. We study the relationship of national HIV surveillance data with the Baidu index (BDI) and use it to monitor AIDS epidemic and inform targeted intervention. After screening keywords and making index composition, we used seasonal autoregressive integrated moving average (ARIMA) modeling. The most correlated search engine query data was obtained by using ARIMA with external variables (ARIMAX) model for epidemic prediction. A significant correlation between monthly HIV/AIDS report cases and Baidu Composite Index (r = 0.845, P < 0.001) was observed using time series plot. Compared with the ARIMA model based on AIDS surveillance data, the ARIMAX model with Baidu Composite Index had the minimal an Akaike information criterion (AIC, 839.42) and the most exact prediction (MAPE of 6.11%). We showed that there are close correlations of the same trends between BDI and HIV/AIDS reports cases for both increasing and decreasing AIDS epidemic. Therefore, the Baidu search query data may be a good useful indicator for reliably monitoring and predicting HIV/AIDS epidemic in China.

Highlights

  • The Internet search engine has become an important platform for public access to information as well as data archive, with the latter serving as research source in various disciplines

  • We use the internet search data provided by the Baidu index to survey the Human Immunodeficiency Virus (HIV)/AIDS epidemics in China

  • We developed an Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) model based on the keyword search index of the Internet and examined whether it improved the model’s forecasting ability[19]

Read more

Summary

Result

In Beijing and Shanghai the number of HIV/AIDS cases is far lower than other provinces, but the search index of kinds of keywords is still at a relatively high level. In Guangxi and Xinjiang province, on the one hand, the correlation between keyword searches and the number of AIDS cases was significantly lower than that of Sichuan and Chongqing province which had the same epidemic and search volume, and on the other hand, it showed a relatively weak negative correlation trend (Supplementary Fig. S1). Establishing co-correlation coefficient plots for different delay orders, we can see that differencing Baidu CI has a significant lag effect with differencing HIV/AIDS report cases and the parameters obtained after estimating the model are significantly non-zero, the residual autocorrelation test shows a random distribution that there is no autocorrelation in the residual (Table 4, Fig. 5). The predicted cases of the final fit of the predictive model are basically consistent with the measured cases, which are within the 95% confidence interval (Supplementary Fig. S4)

Discussion
Findings
Materials and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call