Enhancing sentiment and intent analysis in public health via fine-tuned Large Language Models on tobacco and e-cigarette-related tweets.

Sherif Elmitwalli,John Mehegan,Allen Gallagher,Raouf Alebshehy

doi:10.3389/fdata.2024.1501154

Sherif Elmitwalli, John Mehegan + Show 2 more

Open Access

https://doi.org/10.3389/fdata.2024.1501154

Copy DOI

Export

Save

Cite

Journal: Frontiers in big data	Publication Date: Jan 1, 2024
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Accurate sentiment analysis and intent categorization of tobacco and e-cigarette-related social media content are critical for public health research, yet they necessitate specialized natural language processing approaches. To compare pre-trained and fine-tuned Flan-T5 models for intent classification and sentiment analysis of tobacco and e-cigarette tweets, demonstrating the effectiveness of pre-training a lightweight large language model for domain specific tasks. Three Flan-T5 classification models were developed: (1) tobacco intent, (2) e-cigarette intent, and (3) sentiment analysis. Domain-specific datasets with tobacco and e-cigarette tweets were created using GPT-4 and validated by tobacco control specialists using a rigorous evaluation process. A standardized rubric and consensus mechanism involving domain specialists ensured high-quality datasets. The Flan-T5 Large Language Models were fine-tuned using Low-Rank Adaptation and evaluated against pre-trained baselines on the datasets using accuracy performance metrics. To further assess model generalizability and robustness, the fine-tuned models were evaluated on real-world tweets collected around the COP9 event. In every task, fine-tuned models performed much better than pre-trained models. Compared to the pre-trained model's accuracy of 0.33, the fine-tuned model achieved an overall accuracy of 0.91 for tobacco intent classification. The fine-tuned model achieved an accuracy of 0.93 for e-cigarette intent, which is higher than the accuracy of 0.36 for the pre-trained model. The fine-tuned model significantly outperformed the pre-trained model's accuracy of 0.65 in sentiment analysis, achieving an accuracy of 0.94 for sentiments. The effectiveness of lightweight Flan-T5 models in analyzing tweets associated with tobacco and e-cigarette is significantly improved by domain-specific fine-tuning, providing highly accurate instruments for tracking public conversation on tobacco and e-cigarette. The involvement of domain specialists in dataset validation ensured that the generated content accurately represented real-world discussions, thereby enhancing the quality and reliability of the results. Research on tobacco control and the formulation of public policy could be informed by these findings.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Enhancing sentiment and intent analysis in public health via fine-tuned Large Language Models on tobacco and e-cigarette-related tweets.

Abstract

Published Version

Talk to us

Similar Papers

More From: Frontiers in big data

Lead the way for us

Similar Papers

Fine-tuning large language models for chemical text mining.
Mingyue Zheng ... Qinggong Wang
Chemical science | VOL. 15
Mingyue Zheng, et. al.Mingyue Zheng ... Qinggong Wang
01 Jan 2024
Chemical science | VOL. 15

Efficient Multi-Lingual Sentence Classification Framework with Sentence Meta Encoders
Raj Nath Patel ... Edward Burgin
-
Raj Nath Patel, et. al.Raj Nath Patel ... Edward Burgin
15 Dec 2021
15 Dec 2021

A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM
Md Saef Ullah Miah ... M. F. Mridha
Scientific Reports | VOL. 14
Md Saef Ullah Miah, et. al.Md Saef Ullah Miah ... M. F. Mridha
26 Apr 2024
Scientific Reports | VOL. 14

Sentiment analysis of Malayalam tweets using bidirectional encoder representations from transformers: a study
Syam Mohan Elankath ... Sunitha Ramamirtham
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 29
Syam Mohan Elankath, et. al.Syam Mohan Elankath ... Sunitha Ramamirtham
01 Mar 2023
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Enhancing sentiment and intent analysis in public health via fine-tuned Large Language Models on tobacco and e-cigarette-related tweets.

Abstract

Published Version

Talk to us

Similar Papers

More From: Frontiers in big data