Abstract

Social media data represent an important resource for behavioral analysis of the aging population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with baseline models, yields an improvement of up to 12.3% for Dutch dataset, 9.8% for English1 dataset, and 6.6% for English2 dataset in the micro-averaged F1 score.

Highlights

  • In the digital era, social media has become a ubiquitous part of our daily life where users constantly interact with Facebook, Twitter, Snapchat, among other social media platforms, sharing their experiences and opinions on various topics

  • While we show the improvement in results using these features, further improvement is yielded by using distributed representations of words incorporated in our 2-channel novel convolutional neural network (CNN) model

  • Number of words with non-standard spellings was found to be negatively correlated with age among the older age-group of Twitter users while it was positively correlated in the younger group

Read more

Summary

Introduction

Social media has become a ubiquitous part of our daily life where users constantly interact with Facebook, Twitter, Snapchat, among other social media platforms, sharing their experiences and opinions on various topics. The availability of many social media datasets (e.g., Twitter, public Facebook pages and blogs) offers golden opportunities to social scientists to study psychological and social questions at an unprecedented scale [1]. The open access of many of social media platforms has made it possible for people of every age to become author and reader without any formal restriction. This created an ideal environment for online predators to gain access to sensible user related information, which render internet activities of many vulnerable communities (e.g., kids, teenagers, females) at risk. Automatic identification of age groups from social media posts would offer an edge to crime prevention as

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call