Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting.

Ulrich Schroeders,Timo Gnambs,Christoph Schmidt

doi:10.1177/00131644211004708

Abstract

Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.

Highlights

Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements
Monte Carlo simulations require a large number of specifications; we describe the most important ones below and refer for detailed specifications to an Open Science Framework (OSF) repository (Soderberg, 2018) in which we provide all data and syntax files to foster transparency and reproducibility: https://osf.io/mct37
To evaluate the binary classification into careless respondents (CR) and regular respondents (RR), we report five performance metrics based on the number of correctly identified CR, incorrectly identified CR, correctly identified RR, and incorrectly identified RR: (a) sensitivity or true positive rate or recall (= TP/(TP + FN)), (b) specificity or true negative rate (= TN/(FP + TN)), (c) precision or positive predictive value (= TP/(TP + FP)), (d) accuracy (= (TP + TN)/ (P + N)), and (e) the balanced accuracy, which is the mean of sensitivity and specificity

Summary

Introduction

Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Various data screening methods have been proposed to identify careless respondents (Meade & Craig, 2012; Niessen et al, 2016), such as probing items that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven techniques (e.g., Mahalanobis distance). Empirical data from a web-based experiment in which participants were instructed to display different types of test-taking behavior (regular, inattentive) probe the usefulness of the machine learning algorithm as compared with traditional techniques for the detection of careless respondents. The usefulness of such items is still debated (Curran & Hauser, 2019), because their inclusion can result in negative spillover effects by irritating participants or introducing reactance

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Educational and Psychological Measurement	Publication Date: Apr 19, 2021
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Educational and Psychological Measurement

Lead the way for us

Similar Papers

Unfaithful findings: identifying careless responding in addictions research.
Alexandra Godinho ... John A Cunningham
Addiction | VOL. 111
Alexandra Godinho, et. al.Alexandra Godinho ... John A Cunningham
14 Dec 2015
Addiction | VOL. 111

Using Mokken scaling techniques to explore carelessness in survey research.
Stefanie Wind ... Yurou Wang
Behavior Research Methods | VOL. 55
Stefanie Wind, et. al.Stefanie Wind ... Yurou Wang
21 Sep 2022
Behavior Research Methods | VOL. 55

Careless Responding and Insufficient Effort Responding
Jason L Huang ... Zhonghao Wang
-
Jason L Huang, et. al.Jason L Huang ... Zhonghao Wang
31 Aug 2021
31 Aug 2021

Three Mahalanobis distances and their role in assessing unidimensionality.
Ke‐Hai Yuan ... Steven P Reise
British Journal of Mathematical and Statistical Psychology | VOL. 57
Ke‐Hai Yuan, et. al.Ke‐Hai Yuan ... Steven P Reise
01 May 2004
British Journal of Mathematical and Statistical Psychology | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Educational and Psychological Measurement