Validating a forced-choice method for eliciting quality-of-reasoning judgments

Alexandru Marcoci,Tim Van Gelder,Luke Rowe,Benjamin Stone,Ashley Barnett,Michael L Diamond,Margaret E Webb,Morgan Saletta,Philip E Tetlock,Ariel Kruger,Tamar Primoratz,Christopher W Karvetski,Simon Dennis

doi:10.3758/s13428-023-02234-x

Abstract

In this paper we investigate the criterion validity of forced-choice comparisons of the quality of written arguments with normative solutions. Across two studies, novices and experts assessing quality of reasoning through a forced-choice design were both able to choose arguments supporting more accurate solutions—62.2% (SE = 1%) of the time for novices and 74.4% (SE = 1%) for experts—and arguments produced by larger teams—up to 82% of the time for novices and 85% for experts—with high inter-rater reliability, namely 70.58% (95% CI = 1.18) agreement for novices and 80.98% (95% CI = 2.26) for experts. We also explored two methods for increasing efficiency. We found that the number of comparative judgments needed could be substantially reduced with little accuracy loss by leveraging transitivity and producing quality-of-reasoning assessments using an AVL tree method. Moreover, a regression model trained to predict scores based on automatically derived linguistic features of participants’ judgments achieved a high correlation with the objective accuracy scores of the arguments in our dataset. Despite the inherent subjectivity involved in evaluating differing quality of reasoning, the forced-choice paradigm allows even novice raters to perform beyond chance and can provide a valid, reliable, and efficient method for producing quality-of-reasoning assessments at scale.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Validating a forced-choice method for eliciting quality-of-reasoning judgments

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods

Lead the way for us

Journal: Behavior Research Methods	Publication Date: Oct 13, 2023
License type: CC BY 4.0

Similar Papers

Comparison between Inter-rater Reliability and Inter-rater Agreement in Performance Assessment
Shih Chieh Liao ... Walter Chen
Annals of the Academy of Medicine, Singapore | VOL. 39
Shih Chieh Liao, et. al.Shih Chieh Liao ... Walter Chen
15 Aug 2010
Annals of the Academy of Medicine, Singapore | VOL. 39

Evaluation and Improvement of Intern Progress Note Assessments and Plans.
Michelle M Kelly ... Jens C Eickhoff
Hospital Pediatrics | VOL. 11
Michelle M Kelly, et. al.Michelle M Kelly ... Jens C Eickhoff
01 Apr 2021
Hospital Pediatrics | VOL. 11

Achieving Inter-Rater Agreement and Inter-Rater Reliability to Assess Fidelity of an Occupation-Based Coaching (OBC) Clinical Trial Intervention
Amy Ann Abbott ... Vanessa Dawn Jewell
British Journal of Occupational Therapy | VOL. -
Amy Ann Abbott, et. al.Amy Ann Abbott ... Vanessa Dawn Jewell
09 Nov 2024
British Journal of Occupational Therapy | VOL. -

Test-retest and inter-rater reliabilities of the of Manual Ability Classification System (MACS) - Farsi version in children with cerebral palsy
...
Journal of Research in Rehabilitation Sciences | VOL. 1
, et. al. ...
30 May 2012
Journal of Research in Rehabilitation Sciences | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Validating a forced-choice method for eliciting quality-of-reasoning judgments

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods