Investigating Constructed‐Response Scoring Over Time: The Effects of Study Design on Trend Rescore Statistics

John R Donoghue,Catherine A Mcclellan,Melinda R Hess

doi:10.1002/ets2.12360

Abstract

When constructed‐response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of exact agreement across times and/or t‐statistics comparing Time A means to Time B means. It is common to treat these rescores with procedures that assume a multinomial sampling model, which is incorrect. The correct, product‐multinomial model reflects the stratification of Time A scores. Using direct computation, the research report demonstrates that both proportion of exact agreement and the t‐statistic can deviate substantially from expected behavior, providing misleading results. Reweighting the rescore table gives each statistic the correct expected value but does not guarantee that the usual sampling distributions hold. It is also noted that the results apply to a wider class of situations in which a set of papers is scored by one group of raters or scoring engine and then a sample is selected to be evaluated by a different group of raters or scoring engine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Investigating Constructed‐Response Scoring Over Time: The Effects of Study Design on Trend Rescore Statistics

Abstract

Talk to us

Similar Papers

More From: ETS Research Report Series

Lead the way for us

Similar Papers

Reliability Evidence for the NC Teacher Evaluation Process Using a Variety of Indicators of Inter-Rater Agreement
T Scott Holcomb ... Richard Lambert
Journal of Educational Supervision | VOL. 5
T Scott Holcomb, et. al.T Scott Holcomb ... Richard Lambert
01 Jan 2021
Journal of Educational Supervision | VOL. 5

Reliability of a Treatment-Based Classification System for Subgrouping People With Low Back Pain
Sharon M Henry ... Janice Y Bunn
Journal of Orthopaedic & Sports Physical Therapy | VOL. 42
Sharon M Henry, et. al.Sharon M Henry ... Janice Y Bunn
07 Jun 2012
Journal of Orthopaedic & Sports Physical Therapy | VOL. 42

An approximate method for the direct calculation of radiation reaction
Jacques D Templin
American Journal of Physics | VOL. 66
Jacques D TemplinJacques D Templin
01 May 1998
American Journal of Physics | VOL. 66

Tools for the Development of Macroeconomic Models of the Fuel and Energy Complex
Alexander M Lukatskii ... Galina V Fedorova
-
Alexander M Lukatskii, et. al.Alexander M Lukatskii ... Galina V Fedorova
01 Oct 2018
01 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating Constructed‐Response Scoring Over Time: The Effects of Study Design on Trend Rescore Statistics

Abstract

Talk to us

Similar Papers

More From: ETS Research Report Series