ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets.

Ari Z Klein,Graciela Gonzalez-Hernandez,Arjun Magge

doi:10.1371/journal.pone.0262087

Ari Z Klein, Graciela Gonzalez-Hernandez + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0262087

Copy DOI

Journal: PLOS ONE	Publication Date: Jan 25, 2022
Citations: 12	License type: CC BY 4.0

Affiliation: University of Pennsylvania

Abstract

Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users’ age. The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets. Our end-to-end automatic natural language processing (NLP) pipeline, ReportAGE, includes query patterns to retrieve tweets that potentially mention an age, a classifier to distinguish retrieved tweets that self-report the user’s exact age (“age” tweets) and those that do not (“no age” tweets), and rule-based extraction to identify the age. To develop and evaluate ReportAGE, we manually annotated 11,000 tweets that matched the query patterns. Based on 1000 tweets that were annotated by all five annotators, inter-annotator agreement (Fleiss’ kappa) was 0.80 for distinguishing “age” and “no age” tweets, and 0.95 for identifying the exact age among the “age” tweets on which the annotators agreed. A deep neural network classifier, based on a RoBERTa-Large pretrained transformer model, achieved the highest F1-score of 0.914 (precision = 0.905, recall = 0.942) for the “age” class. When the age extraction was evaluated using the classifier’s predictions, it achieved an F1-score of 0.855 (precision = 0.805, recall = 0.914) for the “age” class. When it was evaluated directly on the held-out test set, it achieved an F1-score of 0.931 (precision = 0.873, recall = 0.998) for the “age” class. We deployed ReportAGE on a collection of more than 1.2 billion tweets, posted by 245,927 users, and predicted ages for 132,637 (54%) of them. Scaling the detection of exact age to this large number of users can advance the utility of social media data for research applications that do not align with the predefined age groupings of extant binary or multi-class classification approaches.

Highlights

ObjectivesThe objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets
Considering that 72% of adults in the United States use social media [1], it has been widely utilized as source of data in a variety of research applications
For the “age” class, the Support Vector Machine (SVM) classifier achieved an F1-score of 0.772; the classifier based on the bidirectional encoder representations from transformers (BERT)-Base-Uncased pretrained model achieved an F1-score of 0.879; and the classifier based on the RoBERTa-Large pretrained model achieved an F1-score of 0.914, where: 2 x recall x precision true positives

Summary

Objectives

The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Sentiment Analysis for Health Monitoring on Social Media
Rabia Bounaama ... Mohammed El Amine Abderrahim
-
Rabia Bounaama, et. al.Rabia Bounaama ... Mohammed El Amine Abderrahim
01 Feb 2020
01 Feb 2020

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.
Ari Z Klein ... Karen O'Connor
Journal of Medical Internet Research | VOL. 23
Ari Z Klein, et. al.Ari Z Klein ... Karen O'Connor
22 Jan 2021
Journal of Medical Internet Research | VOL. 23

Advancing Accessibility through Automatic Speech Recognition and NLP Integration
Gayathri Shivaraj
International Journal for Research in Applied Science and Engineering Technology | VOL. 12
Gayathri ShivarajGayathri Shivaraj
30 Jun 2024
International Journal for Research in Applied Science and Engineering Technology | VOL. 12

Applicable strategy to choose and deploy a MOOC platform with multilingual AQG feature
Younes-Aziz Bachiri ... Hicham Mouncif
-
Younes-Aziz Bachiri, et. al.Younes-Aziz Bachiri ... Hicham Mouncif
28 Nov 2020
28 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE