Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification.

Chen Mo,Jingjing Yin,Isaac Chun-Hai Fung,Zion Tsz Ho Tse

doi:10.3390/ejihpe11040109

Abstract

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.

Highlights

IntroductionAnalysis of a wide range of health topics has been conducted using text data collected from different social platforms, like Facebook and Twitter [1,2,3]
Some terms were initially removed since the overall frequency of the term in the dataset was less than 500 during data cleansing, so missing values were generated in tweets that only included terms appearing less than 500 times in the whole dataset
The results showed that the Hurdle models provided a better fit to the data, with lower Akaike

Summary

Introduction

Analysis of a wide range of health topics has been conducted using text data collected from different social platforms, like Facebook and Twitter [1,2,3]. Many studies have revealed that one of the limitations of using data from social media for health-related research is the poor reliability and validity of the data [4,5]. The data observed in real-world social media usually show diverse combinations of terms in different sentences, producing a sparse document term matrix (DTM). There is a paucity of relevant data that can be used in data analysis. The problem of the diversity of terms in big text data leads to questions regarding the reliability and validity of information gleaned from social media for health-related studies [5]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal of Investigation in Health, Psychology and Education	Publication Date: Nov 26, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Investigation in Health, Psychology and Education

Lead the way for us

Similar Papers

Potential early identification markers for children with autism spectrum disorder: Unusual vocalizations and theoretical explanations
Min Liu ... Qiaoyun Liu
Advances in Psychological Science | VOL. 30
Min Liu, et. al.Min Liu ... Qiaoyun Liu
01 Mar 2022
Advances in Psychological Science | VOL. 30

Going Viral: The 3 Rs of Social Media Messaging during Public Health Emergencies.
Bhavini Patel Murthy ... Tanya Telfair Leblanc
Health security | VOL. 19
Bhavini Patel Murthy, et. al.Bhavini Patel Murthy ... Tanya Telfair Leblanc
01 Feb 2021
Health security | VOL. 19

Assessing Diverse Students With Autism Spectrum Disorders
Tina Taylor Dyches
The ASHA Leader | VOL. 16
Tina Taylor DychesTina Taylor Dyches
01 Jan 2010
The ASHA Leader | VOL. 16

Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research
Jean Burgess ... Axel Bruns
M/C Journal | VOL. 15
Jean Burgess, et. al.Jean Burgess ... Axel Bruns
11 Oct 2012
M/C Journal | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Investigation in Health, Psychology and Education