Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.

Tim Ken Mackey,Tim Ken Mackey,Tim Ken Mackey,Tim Ken Mackey,Matthew Nali,Vidya Purushothaman,Matthew Nali,Bryan Liang,Neal Shah,Vidya Purushothaman,Mingxiang Cai,Jiawei Li,Matthew Nali,Neal Shah,Mingxiang Cai,Jiawei Li,Jiawei Li,Cortni Bardier

doi:10.2196/20794

Abstract

BackgroundThe coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users.ObjectiveThis study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19–related health care products from Twitter and Instagram.MethodsThis study is conducted in two phases beginning with the collection of COVID-19–related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence.ResultsWe collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19–related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods.ConclusionsResults from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public.

Highlights

The novel coronavirus (2019-nCoV; known as severe acute respiratory syndrome coronavirus 2 [SARS-Cov-2]) and associated diagnosis, the coronavirus disease (COVID-19), has created a global crisis
Results from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic
This retrospective big data study was conducted in two phases: (1) data collection using the public streaming Twitter application programming interface (API) and the use of web scraping on Instagram to collect social media posts filtered for COVID-19–related keywords and (2) data analysis using natural language processing (NLP) to isolate topic clusters related to COVID-19 product sales combined with a deep learning algorithm to classify a larger volume of social media posts for classification of “signal” posts

Summary

Introduction

The novel coronavirus (2019-nCoV; known as severe acute respiratory syndrome coronavirus 2 [SARS-Cov-2]) and associated diagnosis, the coronavirus disease (COVID-19), has created a global crisis. With the advent of social media, an information-sharing culture, and technological dispersion throughout the world to access these platforms (ie, mobile, broadband access) the more than 2.9 billion global social media users have more information resources to help them understand and protect themselves against the coronavirus [4]. The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that have billions of global users

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Public Health and Surveillance	Publication Date: Aug 25, 2020
Citations: 81	License type: cc-by

R Discovery Prime

R Discovery Prime

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Public Health and Surveillance

Lead the way for us

Similar Papers

Estimation of Demographic Traits of the Deputies through Parliamentary Debates Using Machine Learning
Huseyin Polat ... Mesut Korpe
Electronics | VOL. 11
Huseyin Polat, et. al.Huseyin Polat ... Mesut Korpe
29 Jul 2022
Electronics | VOL. 11

NLP Applications for Big Data Analytics Within Healthcare
Aadarsh Choudhary ... Anurag Choudhary
-
Aadarsh Choudhary, et. al.Aadarsh Choudhary ... Anurag Choudhary
01 Jan 2021
01 Jan 2021

Understanding Human Language
Christopher Manning
-
Christopher ManningChristopher Manning
07 Jul 2016
07 Jul 2016

Portability of natural language processing methods to detect suicidality from clinical text in US and UK electronic health records.
Marika Cusick ... Jyotishman Pathak
Journal of affective disorders reports | VOL. 10
Marika Cusick, et. al.Marika Cusick ... Jyotishman Pathak
01 Dec 2022
Journal of affective disorders reports | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Public Health and Surveillance