InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Thales Bertaglia,Lily Heisig,Rishabh Kaushal,Adriana Iamnitchi

doi:10.1609/icwsm.v18i1.31303

Abstract

Large Language Models (LLMs) raise concerns about lowering the cost of generating texts that could be used for unethical or illegal purposes, especially on social media. This paper investigates the promise of such models to help enforce legal requirements related to the disclosure of sponsored content online. We investigate the use of LLMs for generating synthetic Instagram captions with two objectives: The first objective (fidelity) is to produce realistic synthetic datasets. For this, we implement content-level and network-level metrics to assess whether synthetic captions are realistic. The second objective (utility) is to create synthetic data useful for sponsored content detection. For this, we evaluate the effectiveness of the generated synthetic data for training classifiers to identify undisclosed advertisements on Instagram. Our investigations show that the objectives of fidelity and utility may conflict and that prompt engineering is a useful but insufficient strategy. Additionally, we find that while individual synthetic posts may appear realistic, collectively they lack diversity, topic connectivity, and realistic user interaction patterns.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the International AAAI Conference on Web and Social Media

Lead the way for us

Journal: Proceedings of the International AAAI Conference on Web and Social Media	Publication Date: May 28, 2024
Citations: 1

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Olivia R Liu Sheng
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Olivia R Liu Sheng
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.
Michael S Deiner ... Tim K Mackey
JMIR infodemiology | VOL. 4
Michael S Deiner, et. al.Michael S Deiner ... Tim K Mackey
29 Aug 2024
JMIR infodemiology | VOL. 4

Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study
Perttu Hämäläinen ... Anton Kunnari
-
Perttu Hämäläinen, et. al.Perttu Hämäläinen ... Anton Kunnari
19 Apr 2023
19 Apr 2023

Synthetic Replacements for Human Survey Data? The Perils of Large Language Models
James Bisbee ... Brenton Kenkel
Political Analysis | VOL. -
James Bisbee, et. al.James Bisbee ... Brenton Kenkel
17 May 2024
Political Analysis | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the International AAAI Conference on Web and Social Media