To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the "leading" terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance. Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database. Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions. A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R2 of the prediction model was 0.67 and the MAPE was 10.23%. The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2 months in advance. This forecasting model is currently only available for Jiangsu Province.
Read full abstract