A meta-analysis on the reliability of comparative judgement

San Verhavert,Renske Bouwer,Vincent Donche,Sven De Maeyer

doi:10.1080/0969594x.2019.1602027

San Verhavert, Renske Bouwer + Show 2 more

Open Access

https://doi.org/10.1080/0969594x.2019.1602027

Copy DOI

Abstract

ABSTRACTComparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the level of reliability. A meta-analysis was performed on the results of 49 CJ assessments. Results show that there was an effect of the number of comparisons on the level of reliability. In addition, the probability of reaching an asymptote in the reliability, i.e., the point where large effort is needed to only slightly increase the reliability, was larger for experts and peers than for novices. For reliability levels of .70 between 10 and 14 comparisons per performance are needed. This rises to 26 to 37 comparisons for a reliability of .90.

Full Text