Revisiting rating scale development for rater-mediated language performance assessments: Modelling construct and contextual choices made by scale developers

Ute Knoch,Apichat Khamboonruang,Bart Deygers

doi:10.1177/0265532221994052

Ute Knoch, Apichat Khamboonruang + Show 1 more

Open Access

https://doi.org/10.1177/0265532221994052

Copy DOI

Abstract

Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing research for over a decade. In this paper we refine the dominant model of rating scale development by drawing on a corpus of 36 studies identified in a systematic review. We present a model showing the different sources of scale construct in the corpus. In the discussion, we argue that rating scale designers, just like test developers more broadly, need to start by determining the purpose of the test, the relevant policies that guide test development and score use, and the intended score use when considering the design choices available to them. These include considering the impact of such sources on the generalizability of the scores, the precision of the post-test predictions that can be made about test takers’ future performances and scoring reliability. The most important contributions of the model are that it gives rating scale developers a framework to consider prior to starting scale development and validation activities.

Full Text