Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.

Yuan Tian,Nathaniel Osgood,Wenjing Zhang,Wade Mcdonald,Lujie Duan

doi:10.3389/fdgth.2023.1203874

Abstract

The use of social media data provides an opportunity to complement traditional influenza and COVID-19 surveillance methods for the detection and control of outbreaks and informing public health interventions. The first aim of this study is to investigate the degree to which Twitter users disclose health experiences related to influenza and COVID-19 that could be indicative of recent plausible influenza cases or symptomatic COVID-19 infections. Second, we seek to use the Twitter datasets to train and evaluate the classification performance of Bidirectional Encoder Representations from Transformers (BERT) and variant language models in the context of influenza and COVID-19 infection detection. We constructed two Twitter datasets using a keyword-based filtering approach on English-language tweets collected from December 2016 to December 2022 in Saskatchewan, Canada. The influenza-related dataset comprised tweets filtered with influenza-related keywords from December 13, 2016, to March 17, 2018, while the COVID-19 dataset comprised tweets filtered with COVID-19 symptom-related keywords from January 1, 2020, to June 22, 2021. The Twitter datasets were cleaned, and each tweet was annotated by at least two annotators as to whether it suggested recent plausible influenza cases or symptomatic COVID-19 cases. We then assessed the classification performance of pre-trained transformer-based language models, including BERT-base, BERT-large, RoBERTa-base, RoBERT-large, BERTweet-base, BERTweet-covid-base, BERTweet-large, and COVID-Twitter-BERT (CT-BERT) models, on each dataset. To address the notable class imbalance, we experimented with both oversampling and undersampling methods. The influenza dataset had 1129 out of 6444 (17.5%) tweets annotated as suggesting recent plausible influenza cases. The COVID-19 dataset had 924 out of 11939 (7.7%) tweets annotated as inferring recent plausible COVID-19 cases. When compared against other language models on the COVID-19 dataset, CT-BERT performed the best, supporting the highest scores for recall (94.8%), F1(94.4%), and accuracy (94.6%). For the influenza dataset, BERTweet models exhibited better performance. Our results also showed that applying data balancing techniques such as oversampling or undersampling method did not lead to improved model performance. Utilizing domain-specific language models for monitoring users' health experiences related to influenza and COVID-19 on social media shows improved classification performance and has the potential to supplement real-time disease surveillance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Digital Health

Lead the way for us

Journal: Frontiers in Digital Health	Publication Date: Jun 28, 2023
License type: CC BY 4.0

Similar Papers

Psychological Stress Detection Using Transformer-Based Models
Derwin Suhartono ... Irfan Fahmi Saputra
ComTech: Computer, Mathematics and Engineering Applications | VOL. 15
Derwin Suhartono, et. al.Derwin Suhartono ... Irfan Fahmi Saputra
22 Jun 2024
ComTech: Computer, Mathematics and Engineering Applications | VOL. 15

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Calibration of Transformer-Based Models for Identifying Stress and Depression in Social Media
Loukas Ilias ... Dimitris Askounis
IEEE Transactions on Computational Social Systems | VOL. 11
Loukas Ilias, et. al.Loukas Ilias ... Dimitris Askounis
01 Apr 2024
IEEE Transactions on Computational Social Systems | VOL. 11

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT
Shoya Wada ... Yasushi Matsumura
Artificial Intelligence In Medicine | VOL. 153
Shoya Wada, et. al.Shoya Wada ... Yasushi Matsumura
05 May 2024
Artificial Intelligence In Medicine | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Digital Health