Infodemics surveillance system to detect and analyze health misinformation using big data and AI

I Zakir Hussain,M Lotto,J Kaur,P Morita,Z Butt

doi:10.1093/eurpub/ckad160.163

I Zakir Hussain, M Lotto + Show 3 more

Open Access

https://doi.org/10.1093/eurpub/ckad160.163

Copy DOI

Abstract

Abstract Background Health misinformation disseminated on social media detrimentally affects the public's reactions toward public health measures. This has led to costly and harmful public health crises. Due to a lack of a comprehensive expert system that collects and analyzes large volumes of social media data, public health officials are incapable of mitigating health misinformation trends online. This study aimed to remedy this by designing and developing a big data pipeline and ecosystem (Health Misinformation Analysis System (HMAS)) capable of identifying and analyzing health misinformation online. Methods The HMAS system relies on Python, Elastic Stack, and the Twitter V2 API as the main technologies forming the backbone of the system. It comprises of five main components: (1) Data Extraction Framework (DEF); (2) Latent Dirichlet Allocation Topic Model; (3) Sentiment Analyzer; (4) Misinformation Classifier; (5) Elastic Cloud Deployment. Components 1-4 are hosted on a virtual machine with low computing and memory requirements. Through the DEF, HMAS extracts data from the Twitter V2 API. The HMAS expert system uses pre-trained models to perform automatic health misinformation analysis. The analyzed data are loaded and visualized in Elastic Cloud through dashboards and analytics. Results The HMAS system performance is accurate and efficient. It has been successfully utilized by independent investigators to extract significant insights, including a fluoride-related health misinformation use case, which analyzed data from 2015 to 2021. Moreover, HMAS has been applied to a vaccine hesitancy use case (2007-2022) and a heat-related illnesses use case (2011-2022). Conclusions The novel HMAS expert system can potentially aid public health officials globally to detect and analyze concerning trends in health misinformation online. This system can also grow to integrate social media data from several platforms for multiplatform analysis and support non-Western language content. Key messages • The novel HMAS pipeline can collect, detect, and analyze the rapidly increasing amounts of misinformation proliferating on social media related to a particular topic or set of related topics. • Identifying and monitoring health misinformation trends prevalent on social media through analytics dashboards and anomaly/risk detection alerts can lead to prompt governmental intervention.

Full Text