Abstract
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
Highlights
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations
We find that standard English word-level methods can yield estimates of county well-being inversely correlated with survey estimates, due to regional cultural and socioeconomic differences in language use
Among the word-level methods, higher positive emotion/valence estimated from Linguistic Inquiry and Word Count (LIWC) 2015, Affective Norms of English Words (ANEW), and Language Assessment by Mechanical Turk (LabMT)* correlated with lower subjective well-being
Summary
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. Subjective well-being spans cognitive (i.e., life satisfaction), affective (positive and negative emotion), and eudaimonic dimensions (such as a sense of meaning and purpose) [3]; most metrics are based on self-report surveys and interviews of individuals, which might be collected annually and aggregated to represent the well-being of regions or nations. We find that standard English word-level methods (such as Linguistic Inquiry and Word Count 2015’s Positive emotion dictionary and Language Assessment by Mechanical Turk) can yield estimates of county well-being inversely correlated with survey estimates, due to regional cultural and socioeconomic differences in language use. Some of the most frequent misleading words can be removed to improve the accuracy of these word-level methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.