Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.

Lama Alsudias,Paul Rayson

doi:10.2196/27670

Abstract

BackgroundTwitter is a real-time messaging platform widely used by people and organizations to share information on many topics. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, such an analysis is currently not possible in the Arabic-speaking world owing to a lack of basic building blocks for research and dialectal variation.ObjectiveWe collected around 4000 Arabic tweets related to COVID-19 and influenza. We cleaned and labeled the tweets relative to the Arabic Infectious Diseases Ontology, which includes nonstandard terminology, as well as 11 core concepts and 21 relations. The aim of this study was to analyze Arabic tweets to estimate their usefulness for health surveillance, understand the impact of the informal terms in the analysis, show the effect of deep learning methods in the classification process, and identify the locations where the infection is spreading.MethodsWe applied the following multilabel classification techniques: binary relevance, classifier chains, label power set, adapted algorithm (multilabel adapted k-nearest neighbors [MLKNN]), support vector machine with naive Bayes features (NBSVM), bidirectional encoder representations from transformers (BERT), and AraBERT (transformer-based model for Arabic language understanding) to identify tweets appearing to be from infected individuals. We also used named entity recognition to predict the place names mentioned in the tweets.ResultsWe achieved an F1 score of up to 88% in the influenza case study and 94% in the COVID-19 one. Adapting for nonstandard terminology and informal language helped to improve accuracy by as much as 15%, with an average improvement of 8%. Deep learning methods achieved an F1 score of up to 94% during the classifying process. Our geolocation detection algorithm had an average accuracy of 54% for predicting the location of users according to tweet content.ConclusionsThis study identified two Arabic social media data sets for monitoring tweets related to influenza and COVID-19. It demonstrated the importance of including informal terms, which are regularly used by social media users, in the analysis. It also proved that BERT achieves good results when used with new terms in COVID-19 tweets. Finally, the tweet content may contain useful information to determine the location of disease spread.

Highlights

Millions of items of data appear every day on social media, artificial intelligence through natural language processing (NLP) and machine learning (ML) algorithms offers the chance to automate their analysis across many different areas, including health
This paper has, for the first time, shown that Arabic social media data contain a variety of suitable information for monitoring influenza and COVID-19, and crucially, it has improved on previous research methodologies by including informal language
We introduced a new Arabic social media data set for analyzing tweets related to influenza and COVID-19

Summary

Introduction

Background millions of items of data appear every day on social media, artificial intelligence through natural language processing (NLP) and machine learning (ML) algorithms offers the chance to automate their analysis across many different areas, including health. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. Such an analysis is currently not possible in the Arabic-speaking world owing to a lack of basic building blocks for research and dialectal variation. Conclusions: This study identified two Arabic social media data sets for monitoring tweets related to influenza and COVID-19 It demonstrated the importance of including informal terms, which are regularly used by social media users, in the analysis. The tweet content may contain useful information to determine the location of disease spread

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical informatics	Publication Date: Sep 17, 2021
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Business text classification with imbalanced data and moderately large label spaces for digital transformation
Muhammad Arslan ... Christophe Cruz
Applied Network Science | VOL. 9
Muhammad Arslan, et. al.Muhammad Arslan ... Christophe Cruz
30 Apr 2024
Applied Network Science | VOL. 9

Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)
Jairus Mingua ... Dionis Padilla
-
Jairus Mingua, et. al.Jairus Mingua ... Dionis Padilla
28 Nov 2021
28 Nov 2021

BERT-Based Approach for Suicide and Depression Identification
S P Devika ... M S Arpitha
-
S P Devika, et. al.S P Devika ... M S Arpitha
01 Jan 2023
01 Jan 2023

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics