Abstract

Research at the intersection of machine learning and the social sciences has provided critical new insights into social behavior. At the same time, a variety of issues have been identified with the machine learning models used to analyze social data. These issues range from technical problems with the data used and features constructed, to problematic modeling assumptions, to limited interpretability, to the models' contributions to bias and inequality. Computational researchers have sought out technical solutions to these problems. The primary contribution of the present work is to argue that there is a limit to these technical solutions. At this limit, we must instead turn to social theory. We show how social theory can be used to answer basic methodological and interpretive questions that technical solutions cannot when building machine learning models, and when assessing, comparing, and using those models. In both cases, we draw on related existing critiques, provide examples of how social theory has already been used constructively in existing work, and discuss where other existing work may have benefited from the use of specific social theories. We believe this paper can act as a guide for computer and social scientists alike to navigate the substantive questions involved in applying the tools of machine learning to social data.

Highlights

  • Machine learning is increasingly being applied to vast quantities of social data generated from and about people (Lazer et al, 2009)

  • Scholars have argued that machine learning models applied to social data often do not account for myriad biases that arise during the analysis pipeline that can undercut the validity of study claims (Olteanu et al, 2016)

  • Similar critiques have been made by Jacobs and Wallach (2019). They argue that measurement theory, a particular domain of social theory engaging in the validity and reliability of different ways of measuring social constructs, can provide a concrete and useful language with which different definitions of fairness, and the impacts of algorithms, can be assessed

Read more

Summary

INTRODUCTION

Machine learning is increasingly being applied to vast quantities of social data generated from and about people (Lazer et al, 2009). Scholars have argued that machine learning models applied to social data often do not account for myriad biases that arise during the analysis pipeline that can undercut the validity of study claims (Olteanu et al, 2016). We argue and show that at each step of the machine learning pipeline, problems arise which cannot be solved using a technical solution alone. We explain how social theory helps us solve problems that arise throughout the process of building and evaluating machine learning models for social data.

RELATED WORK
THEORY IN
Problem Selection and Framing
Outcome Definition
Data Selection
Feature Engineering
Annotation
Model Construction
THEORY OUT
Generalizability
Parsimony
Fairness
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call