An Examination of the Replicability of Angoff Standard Setting Results Within a Generalizability Theory Framework

Jerome C Clauser,Melissa J Margolis,Brian E Clauser

doi:10.1111/jedm.12038

Abstract

Evidence of stable standard setting results over panels or occasions is an important part of the validity argument for an established cut score. Unfortunately, due to the high cost of convening multiple panels of content experts, standards often are based on the recommendation from a single panel of judges. This approach implicitly assumes that the variability across panels will be modest, but little evidence is available to support this assertion. This article examines the stability of Angoff standard setting results across panels. Data were collected for six independent standard setting exercises, with three panels participating in each exercise. The results show that although in some cases the panel effect is negligible, for four of the six data sets the panel facet represented a large portion of the overall error variance. Ignoring the often hidden panel/occasion facet can result in artificially optimistic estimates of the cut score stability. Results based on a single panel should not be viewed as a reasonable estimate of the results that would be found over multiple panels. Instead, the variability seen in a single panel can best be viewed as a lower bound of the expected variability when the exercise is replicated.

Full Text