Governments and healthcare institutions are increasingly recognizing the value of leveraging social media data to address disease outbreaks. This is due to the rapid dissemination and rich content of social media data, which includes real-time reactions and calls for help from people. However, current research on which social media data can be utilized for information support, as well as the underlying reasons why social media data can be utilized for information support, remains limited. This study aims to make up for the aforementioned limitation by investigating which social media information is more likely to reflect the severity of an outbreak through empirical and prediction models, while also elucidating why social media data has the ability to reflect pandemic through content analysis. The COVID-19 outbreak was utilized as a case example in this study because it has the advantage of enhancing the universality of results and promoting the validation of the model with multiple waves of data. The empirical model results indicate that social media activity from public users is more likely to reflect the ground truth during pandemic. In particular, it was found that negative sentiment expressed in blog posts by public users during pandemic aligns more closely with the severity of disease outbreak. Then, a prediction model was proposed to further validate these findings of the empirical model. Finally, a content analysis was conducted based on the conclusions drawn from empirical model and prediction model. The content analysis revealed that the predictive capability of social media data for pandemic originates from individual self-reporting of illness. This study provides contributions and insights into which types of information can be used for pandemic monitoring and forecasting. The findings of our research have significant implications for governments and healthcare institutions in leveraging social media data for pandemic monitoring and forecasting.
Read full abstract