Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Nathan Teblunthuis,Valerie Hase,Chung-Hong Chan

doi:10.1080/19312458.2023.2293713

Nathan Teblunthuis, Valerie Hase + Show 1 more

https://doi.org/10.1080/19312458.2023.2293713

Copy DOI

Abstract

ABSTRACT Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video. They have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results when input to downstream statistical analyses–unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use “gold standard” validation data, such as that created by human annotators, to correct misclassification bias. We introduce and test such methods, including a new method we design and implement in the R package misclassification_models, via Monte Carlo simulations designed to reveal each method’s limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or those making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Abstract

Talk to us

Similar Papers

More From: Communication Methods and Measures

Lead the way for us

Journal: Communication Methods and Measures	Publication Date: Jan 8, 2024
Citations: 2

Similar Papers

Validation to correct for outcome misclassification bias.
Stephan Lanes ... Daniel C Beachler
Pharmacoepidemiology and Drug Safety | VOL. 32
Stephan Lanes, et. al.Stephan Lanes ... Daniel C Beachler
23 Feb 2023
Pharmacoepidemiology and Drug Safety | VOL. 32

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.
Isaac Akogwu ... Nan Wang
Human Genomics | VOL. Suppl 10 2
Isaac Akogwu, et. al.Isaac Akogwu ... Nan Wang
01 Jul 2016
Human Genomics | VOL. Suppl 10 2

Using machine learning to identify articulatory gestures in time course data
Will Styler ... Jiseung Kim
The Journal of the Acoustical Society of America | VOL. 142
Will Styler, et. al.Will Styler ... Jiseung Kim
01 Oct 2017
The Journal of the Acoustical Society of America | VOL. 142

The “computational turn”: an “interdisciplinary turn”? A systematic review of text as data approaches in journalism studies
Valerie Hase ... Daniela Mahl
Online Media and Global Communication | VOL. 2
Valerie Hase, et. al.Valerie Hase ... Daniela Mahl
17 Mar 2023
Online Media and Global Communication | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Abstract

Talk to us

Similar Papers

More From: Communication Methods and Measures