Abstract

The rank‐ordering method for standard maintaining was designed for the purpose of mapping a known cut‐score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees’ work (scripts). It is a novel application of an old technique (Thurstone’s paired comparison method for scaling psychological stimuli), and one that can be applied when the more familiar methods of statistical equating or item banking are not possible. How should a method like this be evaluated? If the correct mapping were known, then the outcome of a rank‐ordering exercise could be compared against that. However, in the contexts for which the method was designed, there is no ‘right answer’. This paper presents an evaluation of the rank‐ordering method in terms of its rationale, its psychological validity and the stability of the outcome when various factors incidental to the method are varied (e.g. the number of judges, the number of scripts to be ranked, methods of data modelling and analysis).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call