Purpose Systematic differences among raters’ approaches to student assessment may result in leniency or stringency of assessment scores. This study examines the generalizability of medical student workplace-based competency assessments including the impact of rater-adjusted scores for leniency and stringency. Methods Data were collected from summative clerkship assessments completed for 204 students during 2017–2018 the clerkship at a single institution. Generalizability theory was used to explore variance attributed to different facets (rater, learner, item, and competency domain) through three unbalanced random-effects models by clerkship including applying assessor stringency-leniency adjustments. Results In the original assessments, only 4–8% of the variance was attributed to the student with the remainder being rater variance and error. Aggregating items to create a composite score increased variability attributable to the student (5–13% of variance). Applying a stringency-leniency (‘hawk-dove’) correction substantially increased the variance attributed to the student (14.8–17.8%) and reliability. Controlling for assessor leniency/stringency reduced measurement error, decreasing the number of assessments required for generalizability from 16–50 to 11–14. Conclusions Similar to prior research, most of the variance in competency assessment scores was attributable to raters, with only a small proportion attributed to the student. Making stringency-leniency corrections using rater-adjusted scores improved the psychometric characteristics of assessment scores.
Read full abstract