Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.

Rob Chew,James Nonnemaker,Annice Kim,Jamie Guillory,Michael Wenger

doi:10.2196/30257

Rob Chew, James Nonnemaker + Show 3 more

Open Access

PDF Available

https://doi.org/10.2196/30257

Copy DOI

Export

Save

Cite

Journal: Journal of medical Internet research	Publication Date: Jan 18, 2022
Citations: 6	License type: cc-by

Affiliation: RTI International

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundElectronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups.ObjectiveThe aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words.MethodsNER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models—conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network—to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds.ResultsFor brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall.ConclusionsOur findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.

Highlights

IntroductionBackgroundSocial media platforms provide opportunities for brands to market products to users and potential users of tobacco products [1,2,3]
BackgroundSocial media platforms provide opportunities for brands to market products to users and potential users of tobacco products [1,2,3]
Large increases in model performance are not uncommon when starting from pretrained models for computer vision and natural language processing tasks, the sizable leap in mean F1 scores between the FTDB when compared with the second best-performing conditional random field https (CRF) suggests that fine-tuning pretrained models for named entity recognition (NER) tasks on social media is worth the added complexity

Summary

Introduction

BackgroundSocial media platforms provide opportunities for brands to market products to users and potential users of tobacco products [1,2,3]. Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being popular among these groups. We compared three types of NER models—conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network—to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Journal of medical Internet research

Lead the way for us

Similar Papers

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study.
Steven S Doerstling ... Matthew M Engelhard
Journal of Medical Internet Research | VOL. 24
Steven S Doerstling, et. al.Steven S Doerstling ... Matthew M Engelhard
21 Jun 2022
Journal of Medical Internet Research | VOL. 24

Using Recurrent Neural Networks to Extract High-Quality Information From Lung Cancer Screening Computerized Tomography Reports for Inter-Radiologist Audit and Feedback Quality Improvement.
Yucheng Zhang ... Benjamin M.M Grant
JCO Clinical Cancer Informatics | VOL. 7
Yucheng Zhang, et. al.Yucheng Zhang ... Benjamin M.M Grant
01 Mar 2023
JCO Clinical Cancer Informatics | VOL. 7

Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.
Shengyu Liu ... Anran Wang
JMIR medical informatics | VOL. 12
Shengyu Liu, et. al.Shengyu Liu ... Anran Wang
17 Oct 2024
JMIR medical informatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Journal of medical Internet research