(Mis)Measuring People's Attitudes from Social Media

Indira Sen

doi:10.1145/3406865.3418363

Abstract

Activities of people, recorded via digital devices or online environments, offer increasingly comprehensive pictures of both individual and group-level behavior, potentially allowing inferences within and outside the platforms. These digital traces are often in the form of textual units such as tweets or Reddit posts or comments. Compared to solicited survey responses, social media posts are the organic, unsolicited thoughts of people on a variety of topics, and the language in these posts are a key to their attitudes, beliefs and values. Notwithstanding the many promises of digital traces, recent studies have begun to discuss the errors that can occur when digital traces are used to learn about social phenomena. In this thesis, I propose to first, diagnose and characterize issues in the measurement of people's attitudes at scale, and second, mitigate these errors through theory-driven solutions. To critically study and record errors and biases in using digital traces for measuring human behavior, we propose a systematic framework, named 'Total Error Framework for Digital Traces' (TED). TED is inspired by and adapted from the Total Survey Error Framework, developed and employed in survey methodology to assess the validity and reliability of survey-based studies. To mitigate errors unearthed by examining Computational Social Science through TED, we apply several domain specific solutions, such as using linguistic theories to understand people's attitudes. This thesis contributes in improving the reliability and validity of attitude measurement from digital traces.

Full Text