Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods

Kokil Jaidka,Salvatore Giorgi,H Andrew Schwartz,Margaret L Kern,Lyle H Ungar,Johannes C Eichstaedt

doi:10.1073/pnas.1906364117

Abstract

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.

Highlights

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations
We find that standard English word-level methods can yield estimates of county well-being inversely correlated with survey estimates, due to regional cultural and socioeconomic differences in language use
Among the word-level methods, higher positive emotion/valence estimated from Linguistic Inquiry and Word Count (LIWC) 2015, Affective Norms of English Words (ANEW), and Language Assessment by Mechanical Turk (LabMT)* correlated with lower subjective well-being

Summary

Introduction

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. Subjective well-being spans cognitive (i.e., life satisfaction), affective (positive and negative emotion), and eudaimonic dimensions (such as a sense of meaning and purpose) [3]; most metrics are based on self-report surveys and interviews of individuals, which might be collected annually and aggregated to represent the well-being of regions or nations. We find that standard English word-level methods (such as Linguistic Inquiry and Word Count 2015’s Positive emotion dictionary and Language Assessment by Mechanical Turk) can yield estimates of county well-being inversely correlated with survey estimates, due to regional cultural and socioeconomic differences in language use. Some of the most frequent misleading words can be removed to improve the accuracy of these word-level methods

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: Apr 27, 2020
Citations: 150	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Urinary concentrations of environmental contaminants and phytoestrogens in adults in Israel
T Berman ... I Grotto
Environment International | VOL. 59
T Berman, et. al.T Berman ... I Grotto
18 Aug 2013
Environment International | VOL. 59

The Standardized Letter of Evaluation Narrative: Differences in Language Use by Gender.
Danielle Miller ... Abra Fant
Western Journal of Emergency Medicine | VOL. 20
Danielle Miller, et. al.Danielle Miller ... Abra Fant
17 Oct 2019
Western Journal of Emergency Medicine | VOL. 20

Health Utility Assessment Using EQ-5D among Caregivers of Children with Autism
Rahul Khanna ... John P Bentley
Value in Health | VOL. 16
Rahul Khanna, et. al.Rahul Khanna ... John P Bentley
18 Jun 2013
Value in Health | VOL. 16

Clinician Word Use in Dementia Evaluation Reports as a Function of Cognitive Impairment.
Lauren B Flaherty ... Benjamin T Mast
Gerontology & geriatric medicine | VOL. 6
Lauren B Flaherty, et. al.Lauren B Flaherty ... Benjamin T Mast
01 Jan 2020
Gerontology & geriatric medicine | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences