Abstract

To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who they are and what their interests may be. In the present paper we report a systematic review that synthesises current evidence on predicting demographic attributes from online digital traces. Studies were included if they met the following criteria: (i) they reported findings where at least one demographic attribute was predicted/inferred from at least one form of digital footprint, (ii) the method of prediction was automated, and (iii) the traces were either visible (e.g. tweets) or non-visible (e.g. clickstreams). We identified 327 studies published up until October 2018. Across these articles, 14 demographic attributes were successfully inferred from digital traces; the most studied included gender, age, location, and political orientation. For each of the demographic attributes identified, we provide a database containing the platforms and digital traces examined, sample sizes, accuracy measures and the classification methods applied. Finally, we discuss the main research trends/findings, methodological approaches and recommend directions for future research.

Highlights

  • We use the internet and digital devices in many aspects of our lives—to communicate, work, shop, bank, etc

  • In this article we systematically review existing research to address the questions: (i) what demographic attributes can be predicted from digital traces? (ii) what traces and platforms have been studied? and (iii) how effective are current methodologies and predictions? In synthesising this information, we review current findings and offer recommendations for future research

  • Our search generated a total of 327 articles examining 14 demographic attributes including: gender (n = 241), age (n = 157), location (n = 32), political orientation (n = 33), sexual orientation (n = 7), family and relationships (n = 19), ethnicity and race (n = 20), education (n = 16), income (n = 13), language (n = 9), health (n = 9), religion (n = 8), occupation (n = 22), and social class (n = 1)

Read more

Summary

Introduction

We use the internet and digital devices in many aspects of our lives—to communicate, work, shop, bank, etc. With every click or online interaction, digital traces ( known as ‘digital footprints’) are created and captured (usually automatically), providing a detailed record of a person’s online activity. This constant generation of digital data provides opportunities to harvest and analyse ‘big data’ at an unprecedented scale and gain insights to an individual’s demographic attributes, personality, or behaviour. Such information can be incredibly valuable for organisations (e.g. marketers, researchers, governments) hoping to understand digital data and predict future outcomes. Numerous studies have predicted demographic attributes accurately from digital traces including Facebook likes [9,10,11], smartphone logs [12,13,14,15], Flickr tags [16], and language-based features [17,18,19,20]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.