Abstract

Scoring clinical assessments in a reliable and valid manner using criterion-referenced standards remains an important issue and directly affects decisions made regarding examinee proficiency. This generalizability study of students' clinical performance examination (CPX) scores examines the reliability of those scores and of their interpretation, particularly according to a newly introduced, "critical actions" criterion-referenced standard and scoring approach. The authors applied a generalizability framework to the performance scores of 477 third-year students attending three different medical schools in 2008. The norm-referenced standard included all station checklist items. The criterion-referenced standard included only those items deemed critical to patient care by a faculty panel. The authors calculated and compared variance components and generalizability coefficients for each standard across six common stations. Norm-referenced scores had moderate generalizability (ρ = 0.51), whereas criterion-referenced scores showed low dependability (φ = 0.20). The estimated 63% of measurement error associated with the person-by-station interaction suggests case specificity. Increasing the number of stations on the CPX from 6 to 24, an impractical solution both for cost and time, would still yield only moderate dependability (φ = 0.50). Though the performance assessment of complex skills, like clinical competence, seems intrinsically valid, careful consideration of the scoring standard and approach is needed to avoid misinterpretation of proficiency. Further study is needed to determine how best to improve the reliability of criterion-referenced scores, by implementing changes to the examination structure, the process of standard-setting, or both.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call