Setting and Validating Multiple Standards on a Multistage‐Adaptive Test

Jennifer Lewis,Hwanggyu Lim,Frank Padellaro,April L Zenisky,Stephen G Sireci

doi:10.1111/emip.12434

Abstract

AbstractSetting cut scores on multistage‐adaptive tests (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists’ Angoff ratings into cut scores on the scale underlying an MST. The results suggest the test characteristic function and item characteristic curve methods performed similarly, but the method based on dichotomizing panelists’ ratings at a response probability of .67 was unacceptable. The study featured a rating booklet design that allowed us to systematically evaluate the validity of the Angoff ratings across test levels, which contributed internal validity evidence for the cut scores, which were also evaluated using procedural and external validity evidence. The implications of the results for future standard setting studies and research in this area are discussed.

Full Text