Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide

Dmytro Karamshuk,Frances Shaw,Julie Brownlie,Nishanth Sastry

doi:10.1016/j.osnem.2017.01.002

Dmytro Karamshuk, Frances Shaw + Show 2 more

Open Access

https://doi.org/10.1016/j.osnem.2017.01.002

Copy DOI

Abstract

With the rise of social media, a vast amount of new primary research material has become available to social scientists, but the sheer volume and variety of this make it difficult to access through the traditional approaches: close reading and nuanced interpretations of manual qualitative coding and analysis. This paper sets out to bridge the gap by developing semi-automated replacements for manual coding through a mixture of crowdsourcing and machine learning, seeded by the development of a careful manual coding scheme from a small sample of data. To show the promise of this approach, we attempt to create a nuanced categorisation of responses on Twitter to several recent high profile deaths by suicide. Through these, we show that it is possible to code automatically across a large dataset to a high degree of accuracy (71%), and discuss the broader possibilities and pitfalls of using Big Data methods for Social Science.

Highlights

Social science has always had to find ways of moving between the small-scale, interpretative concerns of qualitative research and the large-scale, often predictive concerns of the quantitative
As a case study in applying semi-automated coding, this paper looks at public empathy – the expression of empathy that, even if it is imagined to be directed at one other person [2], can potentially be read by many – in the context of high-profile deaths by suicide
Whereas previous studies have looked at communal grief and individual mourning in untimely deaths such as that of Michael Jackson [18,21], this paper aims to interrogate discourses and practices around suicide in mediated mourning, an area in which there has been much less of a focus to date

Summary

Introduction

Social science has always had to find ways of moving between the small-scale, interpretative concerns of qualitative research and the large-scale, often predictive concerns of the quantitative. The application of traditional methods from qualitative social science, such as the close analysis of a small-scale sample of tweets relating to a public death, or the manual application of a coding frame to a larger volume of responses, are likely to miss crucial insights relating to the volume, patterning or dynamics. The quality of crowd-generated labels is ensured by checking agreement among crowdworkers and between the crowd workers’ labels and the golden set This larger labelled dataset is used to train a supervised machine learning model that automatically labels the entire dataset. Our tests show that the final machine generated labels agree with the crowd labels with an accuracy of 71%, which permits nuanced interpretations This is over 5.6x times the accuracy of random baseline, we still need to reconcile the social side of research interpretations with the potentially faulty automatic classification. We allow for this by explicitly quantifying the errors in each of the labels, and drawing interpretations that still stand despite a margin of safety corresponding to these errors

Related literature

Background and approach

Datasets

Analysis approach: semi-automated coding

Bootstrapping coding using manual effort

Coding typology using trained researchers

Scaling the coding using crowd-sourcing

Fine-tuning execution parameters

Validation of crowdsourced labels

Algorithm

Cross-validation

Manual validation

Analysing dynamics of public empathy

On interpreting semi-automated coding

Findings

Qualitative reading through semi-automated coding

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Online Social Networks and Media	Publication Date: Apr 17, 2017
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Online Social Networks and Media

Lead the way for us

Similar Papers

Application of a Machine Learning-Based Decision Support Tool to Improve an Injury Surveillance System Workflow.
Jesani Catchpoole ... Goshad Nand
Applied clinical informatics | VOL. 13
Jesani Catchpoole, et. al.Jesani Catchpoole ... Goshad Nand
01 May 2022
Applied clinical informatics | VOL. 13

LB2 Applying machine learning to pooled qualitative studies on active travel: a method to uncover unanticipated patterns to inform behaviour change?
E Haynes ... Mp Kelly
-
E Haynes, et. al.E Haynes ... Mp Kelly
01 Sep 2018
01 Sep 2018

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study.
Iris Thiele Isip Tan ... Jeanne Genevive Pillejera
JMIR Formative Research | VOL. 7
Iris Thiele Isip Tan, et. al.Iris Thiele Isip Tan ... Jeanne Genevive Pillejera
28 Jun 2023
JMIR Formative Research | VOL. 7

Legal Governance of Brain Data Derived from Artificial Intelligence
Mahika Ahluwalia
Voices in Bioethics | VOL. 7
Mahika AhluwaliaMahika Ahluwalia
02 Jun 2021
Voices in Bioethics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Online Social Networks and Media