Social Media Corpus Research Articles

Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes. We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM. The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant. LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.

Read full abstract

Based on a new Turkish social media influencer corpus consisting of 30 YouTube vlogs, this study explores young adults’ translanguaging practices, prevalent linguistic events, and spatial repertoires in communication. The analysis focuses on vlogs, where linguistic resources become available in relation to the activities, audience, and organization of places or objects and young adults’ translanguaging practices and their situated functions. The findings demonstrate how to use translanguaging as a comprehensive methodology to analyze communication unfolding with multimodal, multilingual, and digital semiotic repertoires at speakers’ disposal and for the imagined audience integral to the communication in vlogs. Taking a corpus-linguistics perspective to translanguaging, we propose a usage-based focus and undertake a broader understanding of translanguaging as a linguistic and multimodal phenomenon. ABSTRACT (TURKISH) 30 YouTube vlogundan oluşan yeni bir Türk sosyal medya etkileyicileri derlemine dayanan bu çalışma, genç yetişkinlerin diller ötesi pratiklerini, vloglarda hakim olan dilsel olayları ve iletişimdeki mekansal repertuarlarını incelemektedir. Analizin odağında genç yetişkinlerin dilsel kaynaklarının, etkinlikler, izleyici ve mekanların veya nesnelerin düzenlenmesi, diller ötesi pratikler ve bu pratiklerin yerleşik işlevleri ile ilişkili olarak kullanılabilir hale geldiği vloglar bulunmaktadır. Bulgular, vloglardaki iletişimin ayrılmaz bir parçası olan hayali izleyici için üretilen iletişimin konuşmacıların tasarrufunda olan çoklu modlu, çokdilli ve dijital göstergesel repertuarlar ile bir bütün olduğunu ve bu iletişimi çözümlemek için kapsamlı bir metodoloji olan diller ötesilik kuramının kullanılabileceğini göstermektedir. Bu çalışma, diller ötesiliği derlembilim destekli bir yaklaşımla ele alarak, kullanıma dayalı bir odak önermekte ve diller ötesiliği dilbilimsel ve çok modlu bir olgu olarak daha kapsamlı bir anlayışla ele almaktadır.

Read full abstract

Social Media Corpus Research Articles

Related Topics

Articles published on Social Media Corpus

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

People believe political opponents accept blatant moral wrongs, fueling partisan divides.

Hebrew Loanwords in Two Rural Dialects of Arabic in Israel

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Design and construction of a social media corpus: Influencers’ speech in vlogs

Translanguaging dynamics in the digital landscape: insights from a social media corpus

A New Social Media Analytics Method for Identifying Factors Contributing to COVID-19 Discussion Topics

Hack your corpus analysis: How AI can assist corpus linguists deal with messy social media data

Strategies for the Analysis of Large Social Media Corpora: Sampling and Keyword Extraction Methods

Characterization Frames Constructing Endoxa in Activists’ Discourse About the Public Controversy Surrounding Fashion Sustainability

Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP

Mapping Digital Discourses of the Capital Region of Finland

Public perception and usage of the term carbon: Linguistic analysis in an environmental social media corpus

CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations

Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

The use of hēi diào (‘to turn black’) and its related [V diào] forms in social media

Information extraction from digital social trace data with applications to social media and scholarly communication data

‘Bad language’ in the Nordics: profanity and gender in a social media corpus

Building and Analyzing Panic Disorder Social Media Corpus for Automatic Deep Learning Classification Model

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Social Media Corpus Research Articles

Related Topics

Articles published on Social Media Corpus

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

People believe political opponents accept blatant moral wrongs, fueling partisan divides.

Hebrew Loanwords in Two Rural Dialects of Arabic in Israel

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Design and construction of a social media corpus: Influencers’ speech in vlogs

Translanguaging dynamics in the digital landscape: insights from a social media corpus

A New Social Media Analytics Method for Identifying Factors Contributing to COVID-19 Discussion Topics

Hack your corpus analysis: How AI can assist corpus linguists deal with messy social media data

Strategies for the Analysis of Large Social Media Corpora: Sampling and Keyword Extraction Methods

Characterization Frames Constructing Endoxa in Activists’ Discourse About the Public Controversy Surrounding Fashion Sustainability

Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP

Mapping Digital Discourses of the Capital Region of Finland

Public perception and usage of the term carbon: Linguistic analysis in an environmental social media corpus

CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations

Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

The use of hēi diào (‘to turn black’) and its related [V diào] forms in social media

Information extraction from digital social trace data with applications to social media and scholarly communication data

‘Bad language’ in the Nordics: profanity and gender in a social media corpus

Building and Analyzing Panic Disorder Social Media Corpus for Automatic Deep Learning Classification Model

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus